Category Archives: Artificial Intelligence

The (near) Future of AI in Math

I believe that the way mathematicians conduct research will be fundamentally different in the near future. Below, I’ll describe my unique perspective, developed from working as a mathematician and also in big tech developing AI solutions. Whilst AI will play a pivotal role, I believe the key next step is the adoption of software best-practices.

Current State of Mathematical Research

This the current lifecycle of a research paper

  1. Brainstorm to strategize for ideas to explore
  2. Convert Idea into a concrete strategy
  3. Convince oneself that strategy will work
  4. Work on the implementation details
  5. Edit and verify correctness of paper
  6. Submit Paper to a journal
  7. Third Party reviews paper for correctness/value

To an outsider, it may come to a surprise that it is not uncommon for step (7) to be the most painful step in this journey. Reviews can take years and are somewhat subjective. Reviewers are unpaid and need to balance this duty with other competing goals. This is especially painful for early-career mathematicians. A postdoc can be awaiting a career changing decision whilst a paper sits idly in an anonymous review’s mailbox.

Verifying correctness of a paper is a thorny issue. Even after self and peer review, it is not uncommon to find papers with unfixable mathematical flaws. There are many stories of top-tier mathematicians with such errors.

Computer Formalization of Mathematical Proofs

These problems can largely be solved with software engineering. Formal theorem provers, like lean, provide a way to confirm proofs are correct. Thus, if new papers were written in lean, there would be no need for humans to check the validity of the proof. This may half the work for writers of the paper and decrease the work of a reviewer — assuming they actually do check validity — by an order of magnitude.

This sounds great, but in practice there are serious challenges. First off, writing a formal proof in lean currently quite challenging on its own. Mathematicians typically skip details that other experts can infer. Computers do not allow such a luxury. Every tiny step must be justified. Whilst this is a challenge that is immediately apparent when starting to use lean (see my previous blog post), I do not see this as a long term obstacle. The use of computers allows for automation and tooling, which will ease the burden in the long run.

Tooling

Good tooling makes software development quicker, easier and less error prone — and even more enjoyable. For instance, a good IDE, can allow developers to catch and avoid bugs quickly and seamlessly navigate large code bases. Lean also offers tooling (called tactics) that allows developers to go from a statement that is “almost” proved to fully proved.

Another promising direction for good tooling is AI. Tools like Github Copilot and Claude Code give useful autocomplete and even fully code based on structured requirements. Currently these tools have limited effectiveness for Lean specific tasks — likely due to limited available data for good lean proofs. However, I do not see this as a long term blocker.

There is a virtuous cycle: as the tooling gets better, more theorems will be converted to lean, further improving the available data.

Code Migration

In order for computer formalization to work at scale, we need to formalize existing math. This is a monumental challenge. Think of this a code migration, where:

  1. Legacy code (i.e. math papers) is not formal with incomplete — and sometimes incorrect — implementations
  2. Contributions have been made from hundreds of thousands of developers over the course of 2000 years.
  3. The subject matter experts typically have no experience in software engineering
  4. Research communities are largely siloed, with projects usually having less than 5 team members

Code migrations at much smaller scales are already incredibly challenging. Currently, what exists is one large code repository that is used to host all the different code (monorepo). Inside this repository, we can find various folders containing proofs coming from different areas of mathematics. The monorepo approach is known to work at large scale systems, such as at Meta or Google.

The Vision

We wonder which of the steps

  1. Brainstorm to strategize for ideas to explore
  2. Convert Idea into a concrete strategy
  3. Convince oneself that strategy will work
  4. Work on the implementation details
  5. Edit and verify correctness of paper
  6. Submit Paper to a journal
  7. Third Party reviews paper for correctness/value

can be transformed by adopting the advances in software engineering and AI. Already, mathematicians are using ChatGPT to brainstorm research ideas and even come up with proofs of simple lemmas. The latter can easily save hours of time.

Converting an idea to a concrete strategy and convincing oneself it will work is a trickier task. This usually involves deep domain expertise if one wants to be at the cutting edge of research.

Writing the implementation details and checking them is a long process. This usually involves breaking a large problem into many sub-problems and then dilligently verifying the correctness of every sub-problem. While not quite ready now, this is where good tooling could play a pivotal role. I believe a system that takes a statement of a lemma as an input and (usually) provides a correct formal proof as output should be within reach. We have some early signs of this, both through ChatGPT and also by using AI helpers within the lean code base itself. Nevertheless, there is much room for improvement. For a deep dive, Terry Tao posted a series of Youtube videos working through the same lemma with several different AI helpers.

Once we have formally verified the paper, all that is left to do is for the reviewer to provide feedback to decide on whether the paper will be accepted. This usually involves understanding the novelty and impact of a paper. For domain experts, this is a quick process. Furthermore, this is something that researchers should already be doing, as it is important to have a mental map of the research in one’s area.

Conclusion

Mathematics will inevitably adopt the tooling from software engineering and AI. Painful steps in mathematical research will be largely replaced, such as verification and certain implementation details. AI and software tooling will be a force multiplier, allowing individual mathematicians to accomplish much more than they can today.

Breaking Down Problems for LLMs

We all have some problem we are trying to solve. First off, if you can convert the problem to text, or increasingly pdfs, images, video, etc., then you might be able to use AI for some or all of the solution.

Let’s focus on text for now. In this case, we have Large Language Models (LLMs), like ChatGPT, at our disposal.

Now, simply inputting your entire problem into the LLM may not work great or even at all. I talk about this at length in a recent YouTube video, focusing on a feature Amazon recently released.

Simply put, LLMs are fantastic at these sort of tasks:

  • Summarize this
  • Answer this Question based on this context
  • Sentiment Analysis
  • Translate this
  • Generate 10 Ideas for this
  • Write Code to complete this
  • See if there are any errors this text
  • Etc.

Thus if your original problem can be made easier to solve by solving one of these sub-problems, then LLMs might be a good tool for the job.

How one prompts the LLM makes a huge difference on the results. See another recent video of my where I discuss some common tricks.

Once we have a sub-problem in mind, it is sometimes the case that vanilla LLMs are not enough for the job. We can soup them up with things like

Thus we have a lot of tricks up our sleeves to make use of LLMs in practice.

Amazon’s Customer Summary Feature

In a recent YouTube Video I discuss how a cool feature from Amazon could be implemented with Large Language Models.

The “Customer Says” feature, as picture here:

allows one to

  1. See the sentiment of a product along different categories
  2. Click on each category to see some sample reviews.

In the video above, I explain how this feature can be broken down into a bunch of sub-problems, each of which can be solved with LLMs.

Temperature in Natural Language Processing

In Machine Learning, and in particular Generative AI, temperature is a useful hyperparameter for tuning model outputs.

In this post, we will discuss the following.

  • Temperature is a parameter developers can use to alter outputs from Large Language Models
  • With a higher temperature we get more creative outputs.
  • Why changing the temperature is useful.

Let’s start with an example (the code I used is at the end of this post). Consider the following prompt, taken from Google’s Minerva paper.

A line parallel to y = 4x + 6 passes through (5, 10). What is the y-coordinate of the point where this line crosses the y-axis?

I asked ChatGPT four times to answer this prompt, similar to what was done in their paper. Here is what I got:

  • Since a line parallel to y=4x+6 has the same slope, the desired line has slope 4. Therefore, the equation of the line is of the form y=4x+b. Plugging in the point (5, 10), we have 10=4(5)+b, so b=10-4(5)=-10. Therefore, the y-intercept is \boxed{-10}.
  • Since the line is parallel to y = 4x + 6, which has a slope of 4, the line must have a slope of 4. The equation of the line is then y - 10 = 4(x - 5) = 4x - 20. Setting x = 0, we find y = -20 + 10 = \boxed{-10}.
  • The given line has slope 4, so a line parallel to it also has slope 4. If a line with slope 4 passes through (5,10), then it is of the form y-10=4(x-5). Expanding the right side gives y-10=4x-20, so y=4x-10. Setting x=0, we find that the line crosses the y-axis at \boxed{-10}.
  • Since a line parallel to y=4x+6 will have the same slope, this line will have slope equal to 4. Therefore, its y-intercept has y-coordinate \boxed{-14}.

The first three answers are correct, while the last answer is wrong. Actually the final answer is correct all the way up to the 4 in 14. What happened under the hood is that likely ChatGPT was nearly sure the 4 should be a 0, but still was hedging its bets that it could in fact be a 4. In that particular instance, it randomly chose the 4.

To understand temperature, we first have to understand a bit about how large language models work. For instance, the G in GPT stands for generative. In practice, this means that given some text, the Large Language model will try to predict the next letter (or more precisely token, see this previous blog post). For a model like GPT, each letter will be assigned a probability as to how likely it is to be the next character. For instance, in the example above, the model likely assigned a large probability to the character 0 and a small, but positive, one to the character 4. It’s at this point the concept of temperature is useful.

Now that we have a bunch of probabilities assigned to each character, we have to define a methodical way of choosing the next character. Do we just assign the character with highest probability?

Unfortunately, there is no “one size fits all” solution to this problem, which is why we introduce the notion of temperature. Let’s look at an example. Suppose we are choosing between two characters to output next. Suppose further that our model overwhelming thinks the first character is the best choice. We plot how the temperature affects our choice of character in this example.

Here, for a low temperature (i.e. \theta close to 0), the model outputs the first character nearly all the time. But as the temperature grows larger, the model outputs each character around half the time.

Thus as we decrease the temperature, we get closer to the model that only outputs the highest probability character. As we increase the temperature, we get closer to the model that chooses each character randomly and uniformly.

And just for completeness, I will mention that the temperature 0 response I got from ChatGPT in the above example was in alignment with the three correct answers.

Why is Temperature Useful?

The argument made in the aforementioned Minerva paper was that by increasing the temperature, we can have the model generate a variety of outputs. From this variety of outputs, we may pick the “best” one. How we choose the best one can vary, but what they did is just take the most popular one.

This allows us to explore the probability space of answer generated by the generative model in order to make a more informed decision at which one to proceed with.

The Code

Here is the Python code I used to generate the example above. First I created a .env file in the same directory with my Open AI API key (fill out with your API key)

OPENAI_API_KEY=

Then I used Langchain (though this is not really required for such a simple example) as follows:

import os 
from langchain.chat_models import ChatOpenAI
from langchain.chains import LLMChain
from langchain import PromptTemplate

OPENAI_API_KEY = os.environ.get("OPEN_API_KEY")
query = "A line parallel to $y = 4x + 6$ passes through $(5, 10)$. What is the $y$-coordinate of the point where this line crosses the $y$-axis?"

prompt = PromptTemplate.from_template(query )
prompt.format()
llm = ChatOpenAI(temperature = 1)

chain = LLMChain(llm=llm, prompt=prompt)

responses = [chain.run({}) for _ in range(5)]

How Does ChatGPT read?

How would ChatGPT read the infamous “Hello, World!” Does it see each character, sequentially

H e l l o , W o r l d !

Or maybe it sees each word as well as the punctuation:

Hello , World !

By the end of this post we will have a full understanding of this. On the way, we will learn about unicode, UTF-8, and byte pair encoding (BPE).

In order to understand how ChatGPT sees data, we have to understand the data on which it is trained on. The majority of the data used to train GPT-3 comes from the Common Crawl dataset, which is text scraped from the internet. Thus we turn our attention to understanding how text is encoded on the web.

Code Points

In order for our computers to store and transfer text, we need a way of converting characters (i.e. elements of an alphabet, punctuation, etc.) to bits. Thanks to binary numbers it is enough to convert characters to integers (though encoding schemes like the popular UTF-8 provide a more complex and efficient conversion code points to bits, as we will see later).

Thus we first turn our attention to mapping characters to integers, denoted a character encoding. This leads us to Unicode.

A Brief History of Unicode

The earliest character encoding was ASCII (pronounced like as-kee), which stands for the American Standard Code of International Information Exchange. One key problem with it is already evident from the name..what if non-Americans would like to exchange information?

ASCII provides code points for 128 characters, including the English alphabet and common punctuation. ASCII is typically sufficient for sending English messages. You can get the ASCII encoding of the letter A (and vice versa) in python with the following built in function.

print(ord("A")) #ASCII code point of A
print(chr(65)) #character of code point 65

In addition to the aforementioned symbols, there are also code points that correspond to non-printable information, which can cause some confusion.

ASCII contains most of the characters you will need if your goal is to communicate in English, and was widely adopted in the 1960s. However, ASCII cannot support languages with a different alphabets, accented characters, emojis, and more.

Thus a group of people set to create more inclusive standards for representing text, that was also backwards compatible with the already widely adopted ASCII. After several iterations, Unicode is now the widely adopted standard. It is supported by a variety of blue chip companies, as can be seen from their member’s page.

What is Unicode?

Unicode is a way to convert nearly 150,000 characters to integers. For instance, here is a nice list of the integer to character conversions. You can input unicode directly into html via &# followed by the decimal representation. For instance.

<p>&#70000</p>
<p> &#70000 </p>

renders as 𑅰 and 🤠, respectively.

You can also directly write unicode on your local machine by following a tutorial (Mac, Windows, and Linux).

Thus Unicode extends ASCII to accommodate nearly all desired written text with nearly 150,000 characters assigned a code point. It turns out this encoding plays a large part in encoding text in the web and consequently the training of ChatGPT. But before we see this connection, we have to discuss UTF-8.

UTF-8

While unicode is accommodating in terms of encoded characters, it is not terribly efficient. For instance, if you plan to write in mostly ASCII, it would make sense to make those characters require a smaller amount of space to encode. This is exactly the purpose that the UTF-8 encoding serves.

Recall that to store and send text, one needs to convert to bits. In practice, we work with bytes, which is just 8 bits. As 8 bits gives 2^8 = 256 possibilities, all of ASCII can be represented by 1 byte (with room to spare). UTF-8 is an attempt to convert the Unicode code points to bytes in an efficient manner.

A byte can be represented by two hexadecimal numbers. For instance, 0-20 are given by:

0 1 2 3 4 5 6 7 8 9 A B C D E F 10 11 12 13 14

So we can represent the character H, which is 72 in ASCII, by 48 in hexadecimal. UTF-8 converts Unicode code points to either 1,2,3 or 4 bytes. For instance, the UTF-8 encoding of

Hello 🤠

is

48 65 6c 6c 6f 20 f0 9f a4 a0

Note that the first 5 bytes correspond to H e l l o, while the last 4 correspond to 🤠. UTF-8 is set up in such a way that it is clear the last 4 bytes are all part of one character.

Back to Chat GPT

Unicode has the advantage of being able to methodically encode nearly all text on the web to integers. An integer turns out to be perfect for inputting into a machine learning model. However, inputting raw Unicode, which consists of nearly 150,000 characters, would be inefficient and beyond current computational power. For instance, the encoding used for ChatGPT that we will discuss below has 100,261 tokens. Thus it is convenient to have a clever way of converting text to integers that goes beyond Unicode.

Byte Pair Encoding (BPE) is a preprocessing step that allows us to identify subwords that appear often in the text. The starting point of BPE, as the name suggests, is bytes. We start by encoding every single byte to an integer 0-255, which we call a token. Thus any Unicode text can be written as a sequence of tokens via the UTF-8 encoding. For instance, the Hello 🤠 above can be tokenized to

72 101 108 108 111 32 240 159 164 160

However, we can make this more efficient by adding additional tokens. For instance, the word “to” appears quite often in English text. However, it is currently encoded as

83 78

What we can do is create a new token for the word to so that instead of using 2 tokens for this common word, we only use one (this is the “pair” in BPE). Using tiktoken, released by openai, we can see that this is exactly what was done.

#will need to install tikoken: pip install tiktoken 
import tiktoken 
enc = tiktoken.encoding_for_model("gpt-3.5-turbo") #gpt-3.5-turbo - ChatGPT
enc.encode('to')

Running this code, we see that the integer 998 is reserved for to.

Byte Pair Encoding

So how exactly is the byte pair encoding performed? We will give a brief explanation and note the details can be found in ~20 lines of python code in Algorithm 1 of this paper of Senrich, Haddow, and Birch and also explained in Section 2 of the gpt-2 paper.

We start by taking a smallish sample of our text data. We then convert convert the text to bytes via the UTF-8 decoding. After this we see which pair of bytes appears the most often and assign a new token to that pair. We can see the first pair with the following python code (continued from above).

print([x for x in enc.decode_bytes([256])])

The result is 32 32, which corresponds to two consecutive spaces. BPE then repeats this process, with the possibility of joining the newly created token to any other token. In fact, this is repeated over 100,000 times!

We see that the first join is joining bytes 32 with itself. In fact, this is just two consecutive spaces. The first non-space join is that of i and n to form “in” (token 258).

It is worth mentioning that BPE is not the only method of tokenizing. For instance, Google’s Bard uses SentencePiece.

One Issue

It is well known that not every byte sequence is valid UTF-8 code. Thus, it is possible in theory for ChatGPT to produce non-valid UTF-8. Of course, this becomes increasing rare as the model is trained more and more. In fact the decoder provided by tiktoken has a kwarg to specify how to address this exact issue.

Recap

To see how ChatGPT is trained, we first have to understand the data. The data is scraped from the web, which lead us to the UTF-8 encoding. Such an encoding gives nearly 150,000 characters and is inefficient. This motivates looking at a compression technique, i.e. the Byte Pair Encoding.

Using AI to Write Math

Unfortunately (or perhaps, fortunately), we are still far from the days where we can ask a computer to write proofs for us. However, there are tools available today that can concretely assist with writing mathematics.

I made a video on the topic, with blog post below.

While I left research mathematics some time ago, I still find myself typing up some math from time to time. For me this has become a bit easier with Github Copilot.

I am now VSCode as my text editor to write LaTex. VSCode is a free and powerful IDE used by millions of software engineers. With it, one can access GitHub Copilot (for $100/year).

GitHub Copilot saves quite a bit of time in Latexing. Let’s see it in action suggesting useful LaTex code for a matrix.

The lighter text “\begin{pmatrix}” is what is suggested by Github Copilot. You can simply hit tab to accept the suggestion or keep typing to reject it. Let’s keep accepting:

This eventually gives the final result.

As you can see it quite accurately gives the LaTex code for a matrix, and only takes about 3 seconds real-time to do so.

Funny enough, it suggests some non-sense afterwards, about A having integer entries and some other things.

Vscode has a lot of flexibility in itself. There are many extensions that are useful. For instance, to work with latex, you’ll at the very least need a latex and pdf viewer extension. There is also git integration, which allows you to store your work remotely with the click of a couple of buttons.

Another reason to get familiar with VSCode is that it is a powerful tool for programming. That way, if you do every need to do some programming, you will already have a leg up on getting started!

I started a YouTube Channel

My First Video

I am happy to announce that I just posted my first video to a new YouTube Channel. The result is an interview with Xavier Garcia about how Chat GPT works.

What’s Next?

I’ll continue to post videos surrounding Machine Learning, Data Science, and perhaps elsewhere. I plan to do more interviews, individual videos about relevant topics, as well as some educational material.

The Process

I was already meeting with Xavier regularly to discuss Machine Learning topics, and so I thought to myself that other people might benefit from our discussions.

Our discussion for the video was perhaps a bit altered than the usual, as we were a bit less likely to say things “off the cuff.” I’ll be looking to make this easier going forward, as I think a lot of valuable insight lie here.

There was/is a learning curve in making videos, and I’ll definitely be looking to improve the quality as time goes forward. From zero to published video took a couple months (of course not doing this full time). Also, I got lucky and found a very helpful video editor which made things a lot easier. Things like sound quality, video quality, and lighting are not things I’ve had to worry about too much in the past. I didn’t invest too much in equipment, just trying to make use of what I had.

Can Chat-GPT Do Math?

Chat-GPT is a new impressive AI chatbot released by Open AI. Impressive applications of it can be found all over the internet. But can it do math?

By math, we do not mean simply perform computations. Its own design ensures that there will be computational problems it will be unable to solve. What I am more interested in is if it can solve problems that require some mathematical reasoning.

To choose our problems, we use the MMLU dataset. Galactica, a large language model recently released by Meta AI, has achieved some good results on this dataset. Their findings are in section 5.3 of their paper.

In what follows, we’ll pick a few problems and see how Chat-GPT handles them. If it fails, we will try to reword the question in hopes of achieving some partial results.

We note that the MMLU data set has multiple choice questions, but we just present the questions to chat-GPT. We just choose question from the abstract algebra section of the MMLU. Afterwards, we’ll prod chat-GPT for intuition and a proof. We’ll end by suggesting a few applications of chat-GPT for mathematicians.

Question 1

Me: The inverse of -i in the multiplicative group, {1, -1, i , -i } is

Chat-GPT: The inverse of -i in the multiplicative group {1, -1, i, -i} is i. The inverse of an element x in a group is an element y such that $latex xy=yx=1$, where 1 is the identity element of the group. In this group, 1 is the identity element, so the inverse of an element x is an element y such that x*y=1.

For example, the inverse of i is -i, because i*(-i)=1. Similarly, the inverse of -i is i, because (-i)*i=1.

Remark: Chat-GPT answers this question perfectly and gives an excellent explanation. To be fair, this is an easier question.

Question 2

Me: How many homomorphisms are there of Z into Z_2? Please write in latex format.

Chat-GPT: There are two homomorphisms of \mathbb{Z} into \mathbb{Z}_2. They are defined as follows:

f_1:\mathbb{Z}\rightarrow\mathbb{Z}_2 such that f_1(n)=n\pmod{2} for all n\in\mathbb{Z}

f_2:\mathbb{Z}\rightarrow\mathbb{Z}_2 such that f_2(n)=0 for all n\in\mathbb{Z}

Remark: Again, Chat-GPT is spot on. However, I asked this question a few times and it once said 1, and another time said two but gave a false justification:

There are only two homomorphisms of Z (the integers) into Z2 (the integers modulo 2).

One homomorphism sends every integer to 0, and the other homomorphism sends every integer to 1.

Question 3

Me: Let p = (1, 2, 5, 4)(2, 3) in S_5 . Find the index of <p> in S_5. Please write your answer in latex.

Chat-GPT:

The index of <p> in S_5 is the number of elements in S_5 that come before <p> in lexicographic order.

To find the index of <p>, we must first determine the cycle type of p. The cycle type of a permutation is a list of the lengths of its disjoint cycles, listed in decreasing order.

The cycle type of p is [2,2], since p consists of a 2-cycle and a 2-cycle.

Now we can use the cycle index of S_5 to determine the number of permutations in S_5 with cycle type [2,2].

The cycle index of S_5 is given by:

Z(S_5) = \frac{1}{5!} \sum_{i=1}^{5} a_i p_i^{5}

where a_i is the number of permutations in S_5 with i cycles, and p_i is a variable representing a cycle of length i.

Substituting the cycle type [2,2] into the cycle index gives:

\frac{1}{5!} (p_2^2 + p_2^2)

This simplifies to:

\frac{1}{120} (p_2^2 + p_2^2) = \frac{1}{60} (2p_2^2) = \frac{1}{30} p_2^2

The coefficient of p_2^2 in this expression is \frac{1}{30}, so the number of permutations in S_5 with cycle type [2,2] is \frac{1}{30}.

Since <p> is the only permutation in S_5 with cycle type [2,2], it is the \frac{1}{30}-th permutation in S_5.

Therefore, the index of <p> in S_5 is \boxed{\frac{1}{30}}.

Remark: Woah! This is very wrong! The very first sentence is completely wrong. Also, its final conclusion, 1/30 is not even an integer, which is certainly not right. Indeed, the index of a subgroup counts something and most be a positive integer.

We are able to salvage something though. Suppose we computed that the order of <p> is 5, using our algebra skills. Then we can prompt chat-GPT with the following.

Me: Now suppose H \leq S_5 is a subgroup of order 5. What is the index of H in S_5?

Chat-GPT: The index of a subgroup H in a group G is the number of left cosets of H in G. This is also equal to the number of right cosets of H in G.

Since the order of a subgroup divides the order of the group, the order of H must divide the order of S_5, which is 120. The only divisors of 120 that are less than 120 are 1, 2, 3, 4, 5, 6, 8, 10, 12, 15, 20, 24, 30, 40, and 120. Therefore, the only possible orders for H are 1, 2, 3, 4, 5, 6, 8, 10, 12, 15, 20, 24, 30, 40, and 120.

Since the order of H is 5, the index of H in S_5 is \frac{120}{5} = \boxed{24}.

Remark: Here chat-GPT answers perfectly with a great explanation. Perhaps it would be nice to mention that the main fact being used is Lagrange’s theorem, though this can easily be obtained by asking it “What theorem did you use above?”

But what’s the intuition?

What makes a research mathematician great is a certain “intuition” for problems. Let’s see if chat-GPT can explain any of that.

Me: Can you explain the intuition behind the Cauchy-Schwarz inequality? Please write any math formula in latex.

Chat-GPT: Sure! The Cauchy-Schwarz inequality is a useful inequality that arises in various areas of mathematics, such as vector algebra, inner product spaces, and probability theory. It states that for any two vectors x and y, the following inequality holds:

\left(\sum_{i=1}^n x_i y_i\right)^2 \le \left(\sum_{i=1}^n x_i^2\right)\left(\sum_{i=1}^n y_i^2\right)

Intuitively, the Cauchy-Schwarz inequality states that the inner product of two vectors x and y is “controlled” by the lengths of the vectors x and y. To see this, consider the special case where x and y are unit vectors (i.e., vectors of length 1). In this case, the inequality becomes:

\left(\sum_{i=1}^n x_i y_i\right)^2 \le \left(\sum_{i=1}^n x_i^2\right)\left(\sum_{i=1}^n y_i^2\right) = 1 \cdot 1 = 1

which means that the inner product of x and y is always less than or equal to 1. This is consistent with our intuition that the inner product of two unit vectors should be “small” if they are “orthogonal” (i.e., perpendicular) and “large” if they are “parallel” (i.e., pointing in the same direction).

I hope this helps! Let me know if you have any questions.

Remark: This is not too bad, but could certainly be improved. For instance, explaining when equality holds in Cauchy-Schwarz (when one vector is a multiple of the other) helps give a more precise meaning the the last paragraph. One can see this first few paragraphs of this blog post of Terence Tao to see an explanation from one of the world’s leading mathematicians. Also, here is a math stack exchange on the topic.

Prove it.

Me: Can you prove it?

Chat-GPT: Sure! Here is a proof of the Cauchy-Schwarz inequality:

First, we can rewrite the inequality as:

\left(\sum_{i=1}^n x_i y_i\right)^2 - \left(\sum_{i=1}^n x_i^2\right)\left(\sum_{i=1}^n y_i^2\right) \le 0

Next, we can expand the left-hand side as follows:

\sum_{i=1}^n x_i^2 y_i^2 + \sum_{i \ne j} x_i x_j y_i y_j - \sum_{i=1}^n x_i^2 y_i^2 - \sum_{i \ne j} x_i^2 y_j^2 \le 0

This simplifies to:

\sum_{i \ne j} (x_i y_i)(x_j y_j) - \sum_{i \ne j} (x_i y_j)^2 \le 0

Now, we can apply the arithmetic-geometric mean inequality to each term on the right-hand side:

\sum_{i \ne j} (x_i y_i)(x_j y_j) - \sum_{i \ne j} [(x_i y_j)^2]^{\frac{1}{2}}[(x_i y_j)^2]^{\frac{1}{2}} \le 0

This simplifies to:

\sum_{i \ne j} (x_i y_i)(x_j y_j) - \left(\sum_{i \ne j} [(x_i y_j)^2]^{\frac{1}{2}}\right)^2 \le 0

Finally, we can apply the arithmetic-geometric mean inequality again to the term on the right-hand side:

\sum_{i \ne j} (x_i y_i)(x_j y_j) - \left[\prod_{i \ne j} [(x_i y_j)^2]^{\frac{1}{2}}\right]^{\frac{2}{n-1}} \le 0

This simplifies to:

\sum_{i \ne j} (x_i y_i)(x_j y_j) - \prod_{i \ne j} (x_i y_j) \le 0

And we are done! This completes the proof of the Cauchy-Schwarz inequality.

I hope this helps! Let me know if you have any questions.

Remark: Unfortunately, this proof is not quite correct. Both the proof method, as well as the details (i.e. its application of AM-GM) are severely flawed.

Conclusion

To be honest, Chat-GPT is quite impressive for a model that is not tailored for mathematics in any way. I think it could already, in its current format, help a mathematician to prepare lecture notes or even easier parts of a paper. Perhaps its greatest utility would come in automating certain parts of grants and job applications