Category Archives: Additive Combinatorics

On the largest sum-free subset problem in the integers

I recently uploaded “On the largest sum-free subset problem in the integers,” to the arXiv.

Let A \subset \mathbb{Z} be a finite subset of the integers. We say A is sum-free if there are no solutions to

a + b = c,

with a,b,c \in A. We define S(A) to be the size of the largest sum-free subset of A. We seek lower bounds for S(A). It is conjectured that

S(A) \geq (n+C)/3. \ \ \ \ \ (1)

for any C > 0. Erdős established C=1 is admissible and Bourgain later improved this to C=2. By a construction, Eberhard, Green, and Manners showed that C = o(|A|).

I was originally drawn to this problem for two reasons. The first is that the aforementioned result of Erdős is the first additive combinatorics result in Tao and Vu’s additive combinatorics book. The second is that Bourgain’s original proof seemed to have a stigma that it was quite difficult.

We now sketch that C=1 is admissible, as shown by Erdős. The first idea is that the set [1/3,2/3) \subset \mathbb{R}/\mathbb{Z} is sum-free. Thus any subset of this set is also sum-free. Note this set has measure 1/3, which is the same as the multiplicative constant in (1).

The second idea is to randomly map A into [1/3,2/3). Indeed choosing

\theta \in \mathbb{R} / \mathbb{Z}

at random, we consider

\theta \cdot A \cap [1/3,2/3) \subset \mathbb{R}/\mathbb{Z}.

One can check that this set on average has size |A|/3 and as mentioned before, is sum-free.

Bourgain’s work and also our work involves more careful choices of \theta. Underpinning the work is to think of f = [1/3,2/3) - 1/3 as a function, f: \mathbb{R}/\mathbb{Z} \to \mathbb{C}, on the torus and to apply a combination of Fourier techniques and combinatorial techniques.

For a set S, we let

f_S(x) = \sum_{s \in S} f(sx).

Then the Erdős argument above may be restated as \int f_A = 0. Furthermore, (1) would follow from establishing there is an x\in \mathbb{R}/\mathbb{Z} satisfying

f_A(x) \geq C/3.

One new idea in our work is to partition A into A_0 and A_1, where A_1 is the set of elements in A that are divisible by 3. It turns out that this decomposition is useful as

f_{A_1}(x) = f_{A_1}(x+1/3) = f_{A_1}(x+2/3),

while

f_{A_0}(x) + f_{A_0}(x+1/3) + f_{A_0}(x+2/3) = 0.

Thus, for instance, a short argument reveals that if one can establish f_{A_1}(x) \geq C/3, then it follows that (1) for A.

Effective results on the size and structure of sumsets

For a set {A \subset \mathbb{Z}^d} we define the {N}-fold sumset of {A} via

\displaystyle NA : = \{a_1 + \ldots + a_N : a_i \in A\}.

We have the following result of Khovanskii from 1992.

Theorem 1 (Polynomial size of iterated sumsets): Let {A \subset \mathbb{Z}^d} of size {n}. Then there exists a {N_A \geq 1} and {c_0 , \ldots , c_d \in \mathbb{Q}} such that for {N \geq N_A}

\displaystyle |NA| = \sum_{j=0}^d c_j N^j. \ \ \ \spadesuit

One part of a recent work, Andrew Granville, Aled Walker, and I were able to provide effective bounds for {N_A}. We let

\displaystyle w(A) = \max_{a , b \in A} ||a-b||_{\infty}.

Theorem 2 (Effective Khovanskii): Let {A \subset \mathbb{Z}^d} of size {n}. Then there exists {c_0 , \ldots , c_d \in \mathbb{Q}} such that for any

\displaystyle N \geq 2^{2n} n^{n^3 + 1} w(A)^{n^3}.\ \ \ \ \ (1)

one has

\displaystyle |NA| = \sum_{j=0}^d c_j N^j. \ \ \ \spadesuit

It is not clear that (1) is best possible. Certainly there has to be some dependence on how {A} lies in space, as can be seen by taking the elements of {A} to be increasingly spread out. Curran and Goldmakher showed that when {|A| = d+2}, one needs {N_A} at least the order of { \omega(A)^d }

We first recall an elegant proof of Theorem 1 due to Nathanson and Ruzsa, which is also the starting point for Theorem 2.

Proof of Theorem 1: We consider {\mathbb{Z}_{\geq 0}^n}, equipped with {\leq}, the lexicographical ordering. Let {A = \{a_1 ,\ldots , a_n\}}. We then have a map

\displaystyle \phi : \mathbb{Z}_{\geq 0}^n \rightarrow \bigcup_{j=0}^{\infty} jA,

via

\displaystyle \phi(x_1 , \ldots , x_n) =x_1 a_1 + \ldots + x_n a_n.

It is worth noting that if {\phi} is injective, we immediately deduce Theorem 1 by the stars and bars. Typically {\phi} is not injective, but it turns out to not be a significant problem. We let {U} be the set of elements {x} such that there exists a {y} with {||y||_1 = ||x||_1}, {y < x}, and {\phi(y) = \phi(x)}. We call any element of {U} useless. They are deemed useless as

\displaystyle |NA| = \{x \in \mathbb{Z}_{\geq 0}^n : ||x||_1 = N\} \setminus U.

There is nothing intrinsic that makes elements of {U} useless, rather it is a consequence of our choice of lexicographical ordering. One can check that {U} is closed under translations from elements of {\mathbb{Z}_{\geq 0}^n}.

We need a way to count the elements of {U} and thus propose another definition. We let {\leq_{{\rm unif}}} be the partial ordering where {x \leq_{{\rm unif}} y} if {x} is smaller than {y} coordinate-wise. Let {U_{\min}} be the elements {x \in U} such that there is no {y\in U} with {y <_{{\rm unif}} x}. Dickson’s lemma implies that {U_{\min}} is finite. For a set {U'}, we let

\displaystyle B(N, U') = \{x \in \mathbb{Z}_{\geq 0 }^n: ||x||_1 = N , x >_{{\rm unif}} u, \ \text{for all} \ u \in U'\}.

Thus we have, by the inclusion-exclusion principle,

\displaystyle | \{x \in \mathbb{Z}_{\geq 0}^n : ||x||_1 = N\} \setminus U| = \sum_{U' \subset U_{\min}} (-1)^{|U'|}| B(N , U')|.

Thus it is enough to show for any finite {U'}, {\#B(N,U')} is polynomial in {N}, for {N} large enough. This follows from the same stars and bars argument mentioned above, as long as

\displaystyle N \geq \max_{u \in U'} ||u||_{\infty}, \ \ \ \ \ (2)

and Theorem 1 follows. {\spadesuit}

Note that this proof does not give any effective bound on {N_A}, as we do not have any control over the set {U_{\min}}. In particular, one would like to have a bound on the {\ell^{\infty}} norm of the elements of {U_{\min}}, in light of (2). In general, one cannot make Dickson’s lemma quantitative, but in our case we can use the structure of {U_{\min}} to do so. The point is that {U} is defined by linear relations, so one can appeal to Siegel’s lemma.

Proof (sketch) of Theorem 2: We translate {A} so that {0 \in A}, which does not effect the problem (in particular, {w(A)} remains unchanged). We build upon the proof of Theorem 1. Suppose {x \in U_{\min}}. As {x \in U}, there is a {y \in \mathbb{Z}_{\geq 0}^n} such that {||x||_1 = ||y||_1}, {y < x}, and {\phi(x) = \phi(y)}. Thus

\displaystyle \sum_{i \in I} x_i a_i = \sum_{j \in J} y_i a_j.

As {x \in U_{\min}}, one can check that {I\cap J = \emptyset}. We now construct a matrix {M} as follows. The first row has {\#I} 1’s followed by {\#J} {-1}‘s. The remaining {d} rows are give by {(a_i)_{i \in I}} and {(-a_j)_{j \in J}}. Thus

\displaystyle M (x,y)^T = 0 \ \ \ \ \ (3)

One would hope to apply Siegel’s lemma, which asserts that (3) has a solution such that {||(x,y)||_{\infty}} is small. The primary issue is that this small solution, {x^*} may have nothing to do with {(x,y)^T}. However one can translate {(x,y)^T} by a multiple of {x^*} to create a solution that is small in a single coordinate. Then one “forgets” this coordinate and tries proceeds by induction. A secondary issue is that the {x^* \in \mathbb{Z}^n}, may have negative coordinates, but this turns out to not be a critical issue. All of this carried out in section 6 of the aforementioned paper with Granville and Walker. {\spadesuit}

A large gap in a dilate of a set

I recently uploaded “a large gap in a dilate of a set,” to the arXiv. We prove the following.

{\ } Theorem 1: Let {A \subset \mathbb{F}_p} with at least two elements. Suppose {N \leq 2p/|A| -2}. Then there is a {x \in \mathbb{F}_p} and {d \in \mathbb{F}_p^{\times}} such that

\displaystyle (d\cdot A + x ) \cap \{1 , \ldots , N\} = \emptyset.  \ \ \ \spadesuit

{\ }

As the note is only 3 pages, we do not remark on the proof (which uses the polynomial method) but elaborate on some related topics. Note by the pigeon-hole principle, Theorem 1 is true for every {d} if we only insist {N \geq p/|A| - 1}. Thus we have effectively doubled the bound coming from the pigeon hole principle. Note without dilation, Theorem 1 is false, as one can take {|A|} equally spaced elements.

One can ask what happens for Theorem 1 if one does not allow translation by {x}. It turns out that one cannot hope to go beyond {N \geq 2p/|A|}, as it was shown in this paper of Chen, Shparlinski and Winterhof, using the example

\displaystyle A = \{1 , \ldots , p/N\} \cup -\{1 , \ldots , p/N\} .

It is an interesting question to decide to what extent Theorem 1 is true with translation by {x}. We remark this is in a similar spirit to the lonely runner conjecture.

It would be nice if Theorem 1 were true for {N \geq C p/|A|} for all {C}, even in the special case when {|A| \sim \sqrt{p}}. Certainly this is true for a random set, without the need for dilation.

In particular, this would give us hope in answering an old question of Erdös, which we recall. A Sidon set in a abelian group is a set such that {a+b = c+d} with {a,b,c,d \in A} implies {\{a,b\} = \{c,d\}}. Let {r_2(N)} be the largest Sidon set contained in {\{1 , \ldots , N\}}. Erdös asked if

\displaystyle r_2(N) = N^{1/2} + O(1).

There are constructions of Sidon sets of size {N^{1/2}} (for some {N}) coming from { \mathbb{Z} /N \mathbb{Z}} for well-chosen {N}. The hope would be to dilate the set in {\mathbb{Z}/N \mathbb{Z}} so there is a large gap of size {g}, thus finding a Sidon set of inside of {\{1 , \ldots , N-g\}} in place of {\{1 , \ldots , N\}}. It is actually not known if we can take {N } to be a prime in the modular construction, which may be found in this nice survey of O’ Bryant. This is certainly a question of interest.

On the other hand, one can hope to improve Theorem 1 for some of these constructions. It turns out one can easily check that Ruzsa’s construction which is the set {A \subset \mathbb{Z}/ (p(p-1)) \mathbb{Z}} does not admit large gaps. Indeed the set has size {\sim p} but any dilate of {A} does not contain a gap significantly longer than {2p}. This also shows Theorem 1 is false for general cyclic groups. The point is that in his construction the natural projection to {\mathbb{Z}/(p-1)\mathbb{Z}} maps {A} surjectively.

This seems to be a bit of a red herring in the application to Sidon sets. On the other hand, for the well-known construction of Bose-Chowla (contained in the aforementioned survey), the analog of Theorem 1 is true and there is no reason to suspect that it cannot be improved. In fact, in this case a proof also proceeds by the polynomial method.

An Analytic Approach to the Cardinality of Sumsets

Dávid Matolcsi, Imre RuzsaDmitrii Zhelezov and I just uploaded our paper, “An analytic approach to cardinalities of sumsets” to the arXiv. Alongside Ben Green, we also uploaded a follow up paper, “A weighted Prékopa-Leindler inequality and sumsets with quasi-cubes.”

Our focus is sumsets of finite subsets of {\mathbb{Z}^d}. For instance, if {A \subset \mathbb{Z}} and {d} is a positive integer, we have 

\displaystyle |A^d + A^d| =|A+A|^d.

If {A} is not an arithmetic progression, it is known that 

\displaystyle |A+A| \geq 2|A|,

and so we obtain 

\displaystyle |A^d + A^d| \geq 2^d |A|.\ \ \ \ \ (1)

It is natural to look to obtain analogs of (1) for more general subsets of {\mathbb{Z}^d}. For instance, the Brunn-Minkowski inequality implies the continuous analog, 

\displaystyle \mu(X+X) \geq 2^d \mu(X),

whenever {X} is a compact subset of {\mathbb{R}^d}. In the discrete setting such a result is not true. First of all, the notion of cardinality does not distinguish dimension. Thus we can take a one dimensional arithmetic progression and place in {\mathbb{Z}^d}, which will not obtain the growth in (1). For a legitimate {d} dimensional set, one can take an arithmetic progression alongside {d-1} random points and {|A+A| / |A|} will grow only linearly in {d}. There are some situations when we can establish (1). For instance, if {A} contains {\{0,1\}^d}, then Green and Tao showed

\displaystyle |A+A| \geq 2^{d/2} |A|.

Freiman (see chapter 5 of Tao and Vu) also showed if {A \subset \mathbb{Z}^d} such that every hyperplane intersects {A} in {\ll_{d, \epsilon} |A|} points, then 

\displaystyle |A+A| \geq (2^d - \epsilon) |A|.

We also mention the work of Gardner and Gronchi, who prove inequalities for general {d}-dimensional sets. The drawback there is that the extremal examples are nearly one dimensional, and particular they only derive growth that is linear in the dimension. We provide a result in a similar spirt to Green and Tao. To state our result, we need a definition. We define a quasi-cube inductively (on the dimension). Any two point set is a 1-dimensional quasi-cube. A {d} dimensional set {U} is a quasi-cube if {U = U_0 \cup U_1} where {U_0 ,U_1}, where {U_j = (x_j + L_j) \cap U}, with {x_j \in \mathbb{Z}^d}, {L_j} a hyperplane and {U_0,U_1} are themselves quasi-cubes. A cube is a quasi-cube. Also, the trapezoid: 

\displaystyle U = \{(0,0) , (1,0), (0,1) , (1,x)\} , \ \ \ \text{with} \ \ x \neq 0,

is a quasi-cube.

{\ } Theorem 1: Let {A \subset \mathbb{Z}^d} be finite. Suppose that A contains U which a subset of a quasi-cube. Then 

\displaystyle |A+A| \geq |U|^{1/2} |A|, \ \ \ \spadesuit.{\ }

This has applications to the sum-product problem via the Bourgain-Chang argument and will be explored in a future paper of Pálvölgyi and Zhelezov. We discuss some of the ideas. As mentioned above, Theorem 1 can be thought of as a discrete analog of the Brunn-Minkowski inequality. The Brunn-Minksowski inequality can be proved using the Prékopa-Leindler inequality, and this is the viewpoint we adopt. 

To start, for {A,B \subset \mathbb{Z}}, we have 

\displaystyle |A+B| \geq |A| + |B| - 1,\ \ \ \ \ (2)

and for {X,Y \subset \mathbb{R}^d} are compact, we have 

\displaystyle \mu(X+Y) \geq \mu(X) + \mu(Y).\ \ \ \ \ (3)

Both can be proved in the same way: by finding a translate of {X} and a translate of {Y} in {X+Y} that are almost disjoint. In the continuous setting (3) is used to establish a functional analog. For compactly support and bounded {f,g:\mathbb{R} \rightarrow \mathbb{R}_{\geq 0}} we define

\displaystyle f \mathbin{\overline *} g(z) : = \sup_{x+y =z} f(x) g(y).

Then the Prékopa-Leindler inequality implies that

\displaystyle \int f \mathbin{\overline *} g \geq ||f||_2 ||g||_2.\ \ \ \ \ (4)

When {f=1_X} and {g=1_Y}, we obtain a weaker variant of (3):

\displaystyle \mu(X+Y) \geq 2 \mu(X)^{1/2} \mu(Y)^{1/2}.

To go in the other direction, one applies (3) to the level sets of {f} and {g}Gardner’s survey provides more accurate implications along these lines. Thus it is natural to ask for a functional analog of (2). Indeed, we let {a,b: \mathbb{Z} \rightarrow \mathbb{R}_{\geq 0}} be compactly supported. Then Prékopa showed that

\displaystyle ||a \mathbin{\overline *} b|| \geq 2 ||a||_2 ||b||_2 - 1.\ \ \ \ \ (5)

We invite the reader to try to discover a proof, it is rather non-trivial! In any case, the next step in Prékopa-Leindler is to tensorize, as is explained in this blog post of Tao or section 4 of the aforementioned survey of Gardner. The point is the integral inequality (4) can be obtained by induction on the dimension by applying the lower dimensional analog of (4) to functions such as {f(x_1 , \ldots , x_{d-1} , x_d)} with {x_d} fixed. Doing this, one obtains, for compactly support and bounded {f,g:\mathbb{R} \rightarrow \mathbb{R}_{\geq 0}} 

\displaystyle \int f \mathbin{\overline *} g \geq ||f||_2 ||g||_2.

A slight generalization of this inequality quickly implies the Brunn-Minkowski inequality. In the discrete setting, the {-1} present in both (2) and (5) are quite a nuisance, particularly in the tensorization step. To get around this, we observe that 

\displaystyle |A+B+U| \geq |A|+|B| \geq 2 |A|^{1/2} |B|^{1/2},

{U \subset \mathbb{Z}} of size 2. It turns out one has 

\displaystyle ||a \mathbin{\overline *} b \mathbin{\overline *} 1_U|| \geq 2 ||a||_2 ||b||_2 ,

as was originally observed by Prékopa himself. One has an easier time tensorizing this, and following this one can obtain 

\displaystyle |A+B+\{0,1\}^d| \geq |A|^{1/2} |B|^{1/2}.

In our work, we take an abstract approach, defining

\displaystyle \beta(U) = \inf_{A,B \neq \emptyset} \frac{|A+B+U|}{|A|^{1/2} |B|^{1/2}}.

Note that {\beta(U) \leq |U+U+U|/|U|}. Indeed, {\beta} is intended to be a replacement of the usual notion of the doubling constant, {|U+U|/|U|}. It turns out for certain large dimensional sets, we can accurately estimate {\beta}. For instance, we show that if {U} is a subset of a quasi-cube then

\displaystyle \beta(U) = |U|,

which quickly implies Theorem 1. The upper bound follows immediately from the definition of {\beta}, it is the lower bound that takes some work. To do this, we show that this is the same as a functional analog and that tensorization occurs in general. We also explore general properties (for instance {\beta} is independent of the ambient group) and present a bunch of conjectures (which may be interpreted as things we were unable to prove or disprove). 

The above outline nearly works for subsets of quasi-cubes, though it turns out one needs a stronger 1 dimensional inequality of the form

\displaystyle ||f \mathbin{\overline *} g \mathbin{\overline *} h_{\delta}||_1 \geq (1+\delta) ||f||_2 ||g||_2,

where {h = (1,\delta)}. The cases {\delta = 0,1} were already known but for other values of {\delta} it is tricker. There are now two proofs: in the original paper we use a variational argument while in the follow up paper we derive it from the Prékopa-Leindler inequality.

This is a sort of “tensorization plus two point inequality argument,” which is present in other works (i.e. Beckner’s inequality).

The Frobenius Postage Stamp Problem, and Beyond

Andrew Granville and I just uploaded, “The Frobenius postage stamp problem, and beyond” to the arXiv.

Let {A \subset \mathbb{Z}} with {\min A = 0}, {\max A = b}, and who has greatest common divisor of all the elements is 1. For positive integer {N}, we are interested in the structure of

\displaystyle NA = \{a_1 + \ldots + a_N : a_i \in A\}.

For instance,

\displaystyle NA \subset \{0, \ldots ,bN\}.\ \ \ \ \ (1)

Equality does not hold in general. For instance if {1 \notin A}, then {1 \not NA} for any {N \geq 1}. We set

\displaystyle \mathcal E(A) = \mathbb{Z}_{\geq 0} \setminus \cup_{M \in \mathbb{Z}_{\geq 1}} MA ,

Thus we can refine (1) to

\displaystyle NA \subset \{0 , \ldots , bN\} \setminus \mathcal E (A)\ \ \ \ \ (2)

It turns out there is one more obstruction. Note that {n \in NA} if and only if {bN-n \in bN- NA} and so

\displaystyle bN -n \notin \mathcal E(b-A).

Thus we may refine (2) to

\displaystyle NA \subset \{0 , \ldots , bN\} \setminus (\mathcal E (A) \cup (bN - \mathcal E(b-A))).\ \ \ \ \ (3)

Our main result is the following.

{\ } Theorem 1: For {N \geq b}, (3) is an equality. {\spadesuit} {\ }

Theorem 1 is close to best possible, as can be seen from {A = \{0,1,b-1,b\}}. For 3 element sets, we actually show Theorem 1 holds for all {N \geq 1}, which contains some of the ideas present in the general case.

We prove Theorem.1 for {N\geq 2b} from first principles. To get from {2b} to {b}, here are two key additional inputs: a structural result of Savchev and Chen and Kneser’s theorem from additive combinatorics.

A key definition is {n_{a,A}}, which is the smallest element in {\mathcal P (A) : = \cup_{M \in \mathbb{Z}_{\geq 1}} MA} that is equivalent to {a \ ({\rm Mod}) \ b}. For instance, it follows from a short argument of Erdös that {n_{a,A} \in bA}. Indeed we have

\displaystyle n_{a,A} = \sum_{j=1}^r a_j.

If any subsum is {0 \ ({\rm Mod}) \ b}, we may remove it to get a smaller element in {\mathcal{P}(A)} that is {a \ ({\rm Mod}) \ b}. Thus no subsum is {0 \ ({\rm Mod}) \  b}. But this quickly implies that {r \leq b}, as desired.

We also study a natural higher dimensional analog of Theorem 1, utilizing some basic tools such as Carathéodory’s Theorem and Mann’s Lemma. In this setting, we provide an analog of Theorem 1, though our bounds are not nearly as good.

Distinct Consecutive Differences

Imre RuzsaJózsef SolymosiEndre Szemerédi, and I recently uploaded On distinct consecutive differences. We say {A = \{a_1 < \ldots < a_k\} \subset \mathbb{R}} is convex if for all {1 \leq i \leq k-2}, one has

\displaystyle a_{i+2} - a_{i+1} > a_{i+1}- a_{i}.

We also adopt Vinogradov’s asymptotic notation. The fundamental question in the area is the following. 

{\ } Conjecture 1 (Erdős): Suppose {A \subset \mathbb{R}} is convex. Then for any {\epsilon >0}, one has 

\displaystyle |A+A| \gg |A|^{2-\epsilon}. \ \ \ \spadesuit{\ }

Progress towards Conjecture 1 was originally made by Hegyvári, with significant improvements by several authors. There is a natural barrier in that for convex {A} 

\displaystyle |A+A| \gg |A|^{3/2},\ \ \ \ \ (1)

that was obtained by several methods. In the current work, we present one that allows of a generalization of indepedent interest. 

{\ } Theorem 1 (Distinct Consecutive differences): Let {A \subset \mathbb{R}} such that the consecutive differences are distinct. Let {B \subset \mathbb{R}} be arbitrary. Then 

\displaystyle |A+B| \gg |A| |B|^{1/2} .\ \ \ \spadesuit{\ }

Note that Theorem 1 immediately implies (1). It turns out that Theorem 1 is best possible, up to the constant. In the extremal construction we present, {A} and {B} have very different structure and so it is natural to ask if one can improve upon Theorem 1 in the case {B= A}. The proof is short and purely combinatorial, in a similar spirit to a proof of the Szemerédi-Trotter theorem for cartesian products found in these notes of Solymosi

We also provide a short proof that for convex {A}

\displaystyle |A+A-A| \gg |A|^2,

which is certainly implied by Conjecture 1. It is interesting to note that the proof is quite inflexible in that for each {x} we find one representation of 

\displaystyle x = a+ b - c, \ \ \ a,b,c \in A.

For instance, I do not see how to find {|A|^2} elements with at least 100 representations as {a+b-c}

We also present a proof of an improvement to (1). It is somewhat annoying that all improvements to (1) lead to quite small quantitive bounds. The interested reader should first see this short paper of Schoen and Shkredov, as well as this previous blog post and and this other previous blog post. I also have some informal notes on the spectral method of Shkredov, which I can distribute upon request.

Recent Progress on a Problem of Sárközy

I would like to thank Brandon Hanson, Giorgis Petridis, and Kevin Ford for helpful contributions to this post.

A well-known problem of Sárközy is the following.

{\ } Conjecture 1: Let {p} be a prime and {S \subset \mathbb{F}_p^{\times}} be the set of squares. Suppose {A+B = S}. Then either {|A| = 1} or {|B| = 1}. {\spadesuit} {\ }

As the squares are a multiplicative subgroup, it is natural to guess they cannot be written additively as a sumet. Conjecture 1 is in a similar spirit of a long list of conjectures concerning the independence of multiplication and addition, such as the twin prime conjecture, the abc-conjecture, and the sum-product conjecture.

Progress towards Conjecture 1, as of last week, was summarized in this mathoverflow post and this other mathoverflow post (and the papers referenced within). Sárközy proved that if {A+B = S}, then

\displaystyle \frac{\sqrt{p}}{3 \log p} \leq |A| , |B| \leq \sqrt{p} \log p,\ \ \ \ \ (1)

which we recall below. In fact, Shparlinski (in Theorem 7) improved (1) and then later Shkredov showed (in Corollary 2.6)

\displaystyle (1/6 - o(1)) \sqrt{p} \leq |A| , |B| \leq (3 + o(1)) \sqrt{p}.

Brandon Hanson and Giorgis Petridis, utilizing the polynomial method, recently made significant progress towards Conjecture 1.

{\ } Theorem 1 (Hanson-Petridis): Suppose {A+B = S}. Then

\displaystyle |A| |B| = |S|.

In particular every element of {S} has a unique representation of the form

\displaystyle a + b , \ \ \ a \in A \ \ , b \in B. \ \ \ \spadesuit

{\ }

Their techniques handle the case where {S} is replaced by any nontrivial multiplicative subgroup, but we focus on the squares for simplicity. In particular, if {(p-1)/2} is a prime, then Conjecture 1 is established. This implies Conjecture 1 is true for infinitely many primes, provided there are infinitely many Sophie Germain primes (yet another conjecture based on the independence of multiplication and addition). Making use of (1) we are able to prove this unconditionally. Here {\pi(x)} is the prime counting function.

{\ } Corollary 1: For all but {o(\pi(x))} primes less than {x}, Conjecture 1 holds. {\spadesuit} {\ }

Proof: Let {p} be a prime. Suppose {A+B = S} with {|A| , |B| > 1}. Then by Theorem 1 and (1), {(p-1)/2} has a divisor between {\frac{\sqrt{p}}{3 \log p} } and {\sqrt{p} \log p}. By Theorem 6 (followed by Theorem 1 (v)) in Kevin Ford’s work on the distribution of integers with divisors in an given interval, there are at most {o(x / \log x)} such primes. Corollary 1 then follows from the prime number theorem. {\spadesuit} {\ }

If {S = A+B}, we always have trivial bound

\displaystyle (p-1)/2 = |A+B| \leq |A||B|, \ \ \ \ \ (2)

which can be compared to the following bilinear estimate.

{\ } Theorem 2: Let {\chi} be the quadratic character modulo {p}. Then

\displaystyle |\sum_{a \in A , b \in B} \chi(a +b ) | \leq \sqrt{p |A| |B|}.

In particular if {S = A+B}, then

\displaystyle |A| |B| \leq p. \ \ \ \spadesuit

{\ }

The proof of Theorem 2, which we give, is standard Fourier manipulations (see chapter 4 of Tao and Vu for more details). As we will see below, Hanson and Petridis make no use of this perspective.

{\ } Proof: The second statement follows from the first as if {S= A + B}, then

\displaystyle \sum_{a \in A , b \in B} \chi(a+b) = |A| |B|.

By Fourier inversion, it is enough to show

\displaystyle \sum_{a \in A , b \in B , \xi \in \mathbb{F}_p} e_p(\xi (a +b)) \widehat{\chi}(\xi) \leq \sqrt{p |A| |B|}, \ \ \ e_p(\theta) : = e^{2 \pi i \theta / p} ,\ \ \ \ \ (3)

where

\displaystyle \widehat{f}(\xi) = \frac{1}{p} \sum_{x \in \mathbb{F}_p} f(x) e_p(-\xi x).

Note { \widehat{\chi}(0) = 0}, and the usual estimate for Gauss sums implies

\displaystyle |\widehat{\chi}(\xi)| \leq p^{-1/2} , \ \ \ \xi \neq 0.

Combining with (3), it is enough to show

\displaystyle \sum_{\xi \in \mathbb{F}_p^{\times}} |\widehat{1_A}(\xi)||\widehat{1_B}(\xi)| \leq p^{-1} \sqrt{|A| |B|} .

But this follows from Cauchy-Schwarz and Parseval. {\spadesuit} {\ }

Combining (2) and Theorem 2, we see that if {S = A+B}, then

\displaystyle (p-1)/2 \leq |A||B| \leq p.\ \ \ \ \ (4)

Thus Theorem 1 asserts that the lower bound in (4) is the only possibility. We now proceed towards a proof of Theorem 1. The starting point is a classical theorem of Fermat.

{\ } Theorem 3 (Fermat): Let {a \in \mathbb{F}_p}. Then {a \neq 0} if and only if {a} is a root of {x^{p-1} - 1}. {\spadesuit} {\ }

There are many proofs of this elementary fact, for instance it is a quick consequence of Lagrange’s theorem. As a consequence, we have the following.

{\ } Corollary 2: Let {a \in \mathbb{F}_p^{\times}}. Then {a} is a square if and only if {a} is a root to

\displaystyle x^{(p-1) / 2} - 1. \ \ \ \spadesuit \ \ \ \ \ (5)

{\ }

Proof: Every square is a root of (5) as if {a = x^2} for some nonzero {x} then by Theorem 3,

\displaystyle a^{(p-1)/2} = x^{p-1} = 1.

Thus the squares are a subset of the roots of (5). On the other hand there are {(p-1)/2} squares and at most {(p-1)/2} roots and so the set of squares is precisely the set of roots of (5). {\ }

In is worth noting that there is a significant gap in the degree of the polynomial in (5) as opposed to in Theorem 3, which we make use of later. We now give a polynomial characterization of the cardinality of a set.

{\ } Lemma 1: Let {0 \leq n \leq p} and {A \subset \mathbb{F}_p}. Then {|A| \geq n} if and only if for any {d_0 , \ldots , d_{n-1} \in \mathbb{F}_p}, the equations

\displaystyle \sum_{a\in A} c_a a^k = d_k , \ \ \ 0 \leq k \leq n-1,

have a solution in the {c_a}. {\spadesuit} {\ }

Proof: This is a classical fact about Vandermonde matrices. {\spadesuit} {\ }

We now proceed to the proof of Theorem 1, which adopts the Stepanov method of auxiliary polynomials.

{\ } Proof of Theorem 1: Let {A , B \subset \mathbb{F}_p} and suppose {A+B = S}. We choose {|A| < (p-1)/2}, which is possible in light of (4). By Lemma 1, we may choose {c_a\in \mathbb{F}_p} so that {g \equiv 0}, where

\displaystyle g(x) = - 1 + \sum_{a \in A} c_a (x+a)^{|A| -1} .\ \ \ \ \ (6)

Let

\displaystyle F(x) = -1 + \sum_{a\in A} c_a (x+a)^D, \ \ \ D = (p-1) / 2 + |A| -1.\ \ \ \ \ (7)

Our choice of {c_a} will ensure that each {b \in B} is a root of {F} with high multiplicity and that {F \neq 0}. By Corollary 2, since {a+b \in S},

\displaystyle F(b) = -1 + \sum_{a\in A} c_a (b+a)^D =- 1 + \sum_{a\in A} c_a (b+a)^{|A| - 1} = g(b).

Thus each {b \in B} is a root of (7). Furthermore,

\displaystyle F'(x) = \sum_{a \in A} c_a D(x+a)^{D-1},

and so, again by Corollary 2,

\displaystyle F'(b) = Dg'(b) = 0.

We may apply this argument for higher derivatives to obtain

\displaystyle F^{(j)} (b) = 0 , \ \ \ 0 \leq j \leq |A| - 1.

Thus each {b \in B} is a root of {F} with multiplicity {|A|} and it follows that, if {F \neq 0},

\displaystyle |A| |B| \leq {\rm deg}(F).\ \ \ \ \ (8)

It turns out that our previous choice for {c_a} ensure that {{\rm deg}(F) = (p-1)/2} (and is thus nonzero). For instance, since {g \equiv 0}, considering the leading term in (6), we have

\displaystyle \sum_{a \in A} c_a = 0.

and so the leading term of {F}, which is the same, is also zero. The same argument works to show that the coefficient of {x^j} of {F} is zero for {(p-1)/ 2 + 1 \leq j \leq D}. Now the constant term of {g} in (6) is

\displaystyle -1 + \sum_{a \in A} c_a a^{|A| -1} = 0,

where is the coefficient of the {x^{(p-1)/2}} term in {F} is

\displaystyle {D \choose (p-1)/2} \sum_{a \in A} c_a a^{|A| - 1} = {D \choose (p-1) / 2} \neq 0.

Combining this with (8), we find

\displaystyle |A||B| \leq (p-1)/2.

The second assertion in the Theorem follows immediately (see also Proposition 2.3 in Tao and Vu). {\spadesuit} {\ }

Remark 1: Suppose {A+B = S} with {|A||B| = (p-1)/2}. The proof of Theorem 1 reveals that

\displaystyle F(x) = \alpha \prod_{b \in B} (x- b)^{|A|}, \ \ \ \alpha = {D \choose |A| - 1},

with {F} and {D} defined (7). Furthermore, we have the identity

\displaystyle \prod_{a \in A , b \in B} (x - (a+b)) = x^{(p-1)/2} - 1. \ \ \ \spadesuit

{\ }

A close variant of the extremal case left by Theorem 1 was studied by Lev and Sonn. Following Sárközy, we now prove (1) (with little concern for the precise constants). The key input is the Weil bounds, and so the square root barrier that appears in (1) is natural (as discussed in this previous blog post).

{\ } Proof of (1): Let {p} be a prime and suppose {A+B = S}. Without loss of generality, we may suppose {|B| \geq |A|}. By (2), it is enough to show

\displaystyle |B| \lesssim \sqrt{p} \log p.\ \ \ \ \ (9)

Let {\chi} be the Legendre symbol and consider, for {A' \subset A},

\displaystyle h(x) = 2^{-|A'|} \prod_{a \in A'} (\chi(x + a) + 1) .\ \ \ \ \ (10)

Thus {h \geq 0} and by our assumption {h(b) = 1} for any {b \in B} and so

\displaystyle |B| \leq \sum_{x \in \mathbb{F}_p} h(x) \ \ \ \ \ (11)

On the other hand, expanding the product in (10) and applying the Weil bounds of the form

\displaystyle \sum_{x \in \mathbb{F}_p} \chi(x+ a_1) \cdots \chi(x+ a_k) \leq (k-1) \sqrt{p},

for distinct {a_1 , \ldots , a_k}, we find

\displaystyle |B| \leq \sum_{x \in \mathbb{F}_p} h(x) \leq p 2^{-|A'|} + (|A'|- 1) \sqrt{p} , \ \ \ A' \subset A.\ \ \ \ \ (12)

This establishes (9) if {|A| \geq \log p / \log 4}, since then we may choose {\log p / \log 4 \leq |A'| \leq \log p / \log 4 + 1}. Otherwise, we apply (12) with {A' = A} and multiply both sides by {|A|}, and using (2), we find

\displaystyle (p-1) / 2 \leq p 2^{-|A|} + (|A| - 1) \sqrt{p}.

This implies, crudely, that for {|A| \geq 2},

\displaystyle p/4 \lesssim \sqrt{p} \log p,

which is absurd for {p} large (and letting the implied constants handle the case {p} small). {\spadesuit}

Sidon-like Sets

What is the largest set avoiding a linear equation? This general question is a central one in additive combinatorics. Indeed the search of extremal structures gives it a combinatorial flavor, while the presence of an additive equation makes such a question ripe for additive techniques. Such questions include well-studied subjects such as sum-free sets, Sidon sets, and sets avoiding arithmetic progressions. In 1993, Ruzsa made an effort to put these examples into a general framework. In what follows, we focus on Sidon sets and their generalizations, as developed by Ruzsa in the same paper.

We fix {k \geq 2} and consider the linear equation

\displaystyle x_1 + \ldots + x_k = y_1 + \ldots + y_k \ \ \ \ \ (1)

Following, Ruzsa, we say a solution is trivial if the {x_i} are a permutation of the {y_i}. We set {r(N) = r_k(N)} to be the size of the largest set of {[N]} with only trivial solutions to (1). We set {R(N) = R_k(N)} to be the size of the largest set of {[N]} with no solutions to (1) in {A} with {x_1 , \ldots , x_k , y_1 , \ldots y_k} distinct. It follows from these definitions that

\displaystyle r(N) \leq R(N).

We will show below that {R_k(N) \ll_k r(N)}, while it is not known It is not known if {R_k(N) \ll r_k(N)}. Indeed one may feel that most of the solutions to (1) where the variables lie in a set {A} come when the variables are distinct, since otherwise we have imposed an additional constraint. This idea does work well for sets with additive structure, but has shortcomings for general sets. Furthermore, it was shown by Timmons that {r_3(N) \neq (1 + o(1)) R_3(N)}.

For a set {A \subset [N]}, we let {n(A)} be the number of nontrivial solutions to (1) in {A} and let {N(A)} be the number of solutions to (1) with distinct variables.

We briefly remark that in Wooley’s efficient congruencing proof of Vinogradov’s mean value theorem, he estimates quantities similar to {N(A)} (where {A} is the set of {k^{\rm th}} powers), as applications of Hensel’s lemma require the variables to be distinct (modulo a certain prime). Indeed there are some overlapping techniques (see Lemma 1 below).

To start, we first consider the case {k=2}. A set with {n(A) = 0} is known in the additive combinatorics literature as a Sidon set. Thus {r_{A-A}(x) \leq 1} for {x \neq 0}. There is a nice construction due to Ruzsa (modifying a previous construction of Erdős), which is just one of several known constructions.

Example 1 (Large Sidon Set): Let {p} be a prime. Let {g} be a primitive root in {\mathbb{F}_p}. One can check the set of residues in the cyclic group of order {p(p-1)}, defined via the Chinese remainder theorem, by

\displaystyle x = k \ {\rm mod} \ p-1 , \ x = g^k \ {\rm mod} \ p , \ \ \ 0 \leq k \leq p-1,

forms a Sidon set in the cyclic group modulo {p(p-1)} and hence in {\mathbb{Z}}. {\spadesuit}

On the other hand, we have the following upper bound.

Theorem 1: Let {A \subset [N]} be a Sidon set. Then

\displaystyle |A| \leq N^{1/2} + N^{1/4} + 1. \ \ \ \spadesuit

Proof: We follow an argument of Green, which will rely on some basic Fourier analysis (we follow notation from Chapter 4 of Tao and Vu’s book on additive combinatorics). Let {1 \leq u \leq N} and consider {A} as a subset of {Z : = \mathbb{Z} / (N+u) \mathbb{Z}}. Let {I = \{1 , \ldots , u\}}. We have by Parseval,

\displaystyle (N+u)^2\sum_{x \in Z} 1_A * 1_{-A}(x) 1_I * 1_{-I}(x) = (N+u)^3 \sum_{\xi} |\widehat{1_A}(\xi)|^2 |\widehat{1_I}(\xi)|^2 \geq \frac{|A|^2 |I|^2}{N+u},\ \ \ \ \ (2)

in the last step isolating the {\xi = 0} term. Since {A} is a Sidon set we have

\displaystyle (N+u)1_{A} * 1_{-A}(x) \leq 1,

for {0 < |x| \leq u} (note this is not true for other values of {x \in Z}). Thus

\displaystyle (N+u)^2\sum_{x \in Z} 1_A * 1_{-A}(x) 1_I * 1_{-I}(x) \leq |A| u + u^2.\ \ \ \ \ (3)

Combining (2) and (3) and choosing the optimal choice of {u = \lfloor N^{3/4} \rfloor} gives the desired result. {\spadesuit}

Green remarks in his paper that above proof only assumes that {r_{A-A}(x) \leq 1} for {0 < x < N^{3/4}} instead of the full Sidon set property. Combining Example 1 and Theorem 1, we find

\displaystyle N^{1/2} \leq r_2(N) \leq N^{1/2} + N^{1/4} + 1.

It is a famous question of Erdős to improve these bounds (in particular the lower bound), which has been stuck for over fifty years (modulo a improvement by Cilleruelo to {N^{1/2} + N^{1/4} + 1/2}). It seems likely that both the lower and upper bound are not optimal and that the truth lies somewhere between {N^{1/2} + C} and {N^{1/2} + O(N^{\epsilon})} for any {C\geq 1} and {\epsilon > 0}. One can see this blog post of Gowers and the comments within for interesting discussion of this problem.

We now consider general {k} and start with a lower bound.

Proposition 1 (Random sets): For {N} large enough,

\displaystyle r(N) \geq \frac{1}{2}8^{-1/(2k-1)} N^{1/(2k-1)} . \ \ \ \spadesuit

Proof: We use the alteration method, which is discussed in Alon and Spencer’s book on the Probabilistic Method. Choose {A \subset [N]} where each element is chosen independently at random with probability

\displaystyle p = 8^{-1/(2k-1)}\frac{N^{1/(2k-1)}}{N}.

The number of solutions to (1) in {[N]} is at most {N^{2k-1}}, so the expected number of solutions to (1) is at most {p^{2k}N^{2k-1}}. By Markov,

\displaystyle \mathbb{P}(n(A) > 2p^{2k}N^{2k-1}) < \frac{1}{2}.

On the other hand, by say Chebyshev,

\displaystyle \mathbb{P}(|A| < \frac{1}{2} p N) < \frac{1}{2},

for {N} large enough. By the union bound, we may choose an {A} such that {|A| \geq pn/2} and {n(A) \leq 2p^{2k} N^{2k-1}}. By our choice of {p}, we have {n(A) \leq |A|/2}. For each nontrivial solution of (1), we delete an element (the so-called alteration) of {A} that renders it no longer a solution and call the resulting set {A'}. Thus {n(A') = 0} and so {r(N) \geq |A'| \geq pN/2}, as desired. {\spadesuit}

Ruzsa remarks that the probabilistic construction above probably never gives the correct order of magnitude for {r(N)}, where (8) is replaced by a suitable linear equation. We already saw this above, as Example one gives a Sidon set of size {N^{1/2}} while the random construction gives one of size about {N^{1/3}}. Moreover, with an explicit construction, Bose and Chowla showed the following improvement to Proposition 1:

\displaystyle r(N) \geq (1 + o(1)) N^{1 / k},

(see this survey of Kevin O’ Bryant for more information). Timmons showed that

\displaystyle R(N) \geq (2^{1 - 1/k} + o(1))N^{1/k},

by pasting together two sets from the aforementioned Bose-Chowla construction.

We now turn to upper bounds. We let {T_k(A)} be the number of solutions in {A} to (1) For any set with {n(A) = 0}, we have that {T_k(A) \leq k!|A|^k} and so by Cauchy Schwarz

\displaystyle |A|^{2k} \leq |kA| T_k(A) \leq k N k! |A|^k.\ \ \ \ \ (4)

It follows that

\displaystyle r(N) \leq (k! k )^{1/k} N^{1/k},

and thus

\displaystyle r(N) \asymp_k N^{1/k}.

This argument fails to bound {R(N)}, as was observed by Ruzsa, who provided an alternative argument, which we recall below. The basic idea is to still use (4), and to show that {T_k(A)} is still small for sets with {N(A) = 0}. We will need the following lemma.

Lemma 1 (Hölder): Let {f} be a function on the torus. Then

\displaystyle \int |f|^{2k-2} \leq \left( \int |f|^{2k} \right)^{1 - 1/(k-1)} \left(\int |f|^2 \right)^{1/(k-1)}.

To prove this, apply Hölder’s inequality to {|f|^{2k} = |f|^{2k - 2/(k-1)} |f|^{2/(k-1)}} with powers {(k-2)/(k-1)} and {k-1}. Note that one can always interpolate an upper bound for {\int |f|^p} in terms of {\int |f|^u} and {\int |f|^v} as long as {u < p < v}.

Proposition 2: We have

\displaystyle R(N) \leq (1 + o(1) )k^{2 - 1/k} N^{1/k}. \ \ \ \spadesuit

Proof: Suppose that {N(A) = 0} (though we will not make use of this until the end). By (4), it is enough to show

\displaystyle T_k(A) \leq (1 + o(1))k^{2k-2} |A|^k.

Note that

\displaystyle T_k(A) = \int |f|^{2k} , \ \ \ f(x) = \sum_{a\in A} e^{2 \pi i a}.

By Lemma 1 and Parseval, it is enough to show

\displaystyle T_k(A) \leq (1 + o(1))k^2 |A| \int |f|^{2k-2}.

The idea is that {N(A) = 0} allows us to eliminate a variable at a cost of only {k^2 |A|}, as opposed to the trivial bound of {|A|^2}.

We let

\displaystyle s(n) = \#\{(x_1 , \ldots , x_k) \in A^k : x_1 + \ldots + x_k = n, \ \ x_i \ \text{distinct} \},

and

\displaystyle \sigma(n) = \#\{(x_1 , \ldots , x_k) \in A^k : x_1 + \ldots + x_k = n\}.

Note we have

\displaystyle T_k(A) = \sum_n \sigma(n)^2.

By the triangle inequality in {\ell^2}, it is enough to show

\displaystyle \sum_n (\sigma(n) - s(n))^2 \leq k^4 \int |f|^{2k-2} , \ \ \ \sum_n s(n)^2 \leq k^2 |A| \int |f|^{2k-2} .\ \ \ \ \ (5)

Note the second term is the larger of the two for {|A|} large.

We dispose of the first inequality. Note {\sigma(n) - s(n)} is the number of solutions to {x_1 + \ldots + x_k = n} with some {x_i = x_j}. There are at most {k^2} choices for the indices and then after relabelling we find

\displaystyle 2x_1 + x_2 + \ldots + x_{k-1} = n.

Thus

\displaystyle \sum_n (\sigma(n) - s(n))^2 \leq k^4 \int |f(2x)|^2 |f(x)|^{2k-4} \leq k^4 \int |f|^{2k-2},

by Hölder’s inequality.

Now we finally use that {N(A) = 0}. Suppose we have a solution to (1) with distinct {x_i} and distinct {y_j}. Since {N(A) = 0}, this implies {x_i = y_j} for some {1 \leq i , j \leq k}. Thus

\displaystyle \sum_n s(n)^2 \leq k^2 |A| \int |f|^{2k-2},

since we have {k^2} choices for {i,j} and {|A|} choices for {x_i}. {\spadesuit}

The best bound for {r(N)} (for {k} large) is due to Green and is of the form

\displaystyle r(N) \leq (1 + o_N(1)) (1 + o_k(1)) \frac{k}{2e} N^{1/k},\ \ \ \ \ (6)

an improvement to the constant in (4). On the other hand, the best bound for {R(N)} is in a recent paper of Schoen and Shkredov:

\displaystyle R(N) \leq 16 k^{3/2} N^{1/k}, \ \ \ N \ \text{large enough}\ \ \ \ \ (7)

adopting ideas from Timmons as well as Ruzsa’s original work. Schoen and Shkredov implicitly mention the following question.

Question 1: Let {A \subset \mathbb{Z}} such that {N(A) = 0}. Does there exist a large {B \subset A} (say {|B| \geq |A| / 2}) such that {B} has no solutions to

\displaystyle x_1 + \ldots + x_{\ell} = y_1 + \ldots + y_{\ell},\ x_i, y_j \ \text{distinct} \ \ \ 1 \leq \ell \leq k.\ \ \ \ \ (8)

An affirmative answer would allow one to apply (5) and then (4) to {B} and eventually get improved bounds for {R(N)} comparable to what is known for {r(N)} (i.e. the same as (6) up to the absolute constant). Schoen and Shkredov are not able to solve Question 1, but are able to show that (8) holds for many {\ell} and then apply a suitable version of Erdős-Ko-Rado to obtain (7). Underlying their argument is the simple observation that a solution to (1) with {k = m} can be added to a solution with {k = n} to create a new solution with {k = m + n}.

We illustrate part of the argument in the {k=4} case, which is significantly simpler than the general case and already handled by Timmons.

Proposition 3: One has

\displaystyle R_4(N) \leq (1 + o(1)) 2^{1/8} 4^{5/8} N^{1/4}.

Proof: We let {A \subset [N]} with {N(A) = 0}. By the first part of (5) and Lemma 1, we have

\displaystyle T_4(A) \leq (1 + o(1))\sum_n s(n)^2.

By the second part of (5),

\displaystyle \sum_n s(n)^2 \leq 16 |A| \int |f|^{6} \leq 16 |A| (\int |f|^8 )^{1/2} (\int |f|^4)^{1/2},

and so

\displaystyle T_4(A) \leq 16^2 |A|^2 \int |f|^4.

Suppose first that {A} has no solutions to (8) with {\ell = 2}. Then

\displaystyle \int |f|^4 \leq 4 |A|^2 .

Indeed there are at most {2|A|^2} solutions to {a+b= c+d} with either {a=b} or {c=d} and also {\leq 2|A|^2} trivial solutions. It follows that

\displaystyle T_4(A) \leq 4^5 |A|^4.

Combining this with (4) gives the desired result. Now suppose {A} does have a solution to (8) with {\ell =2}, say

\displaystyle x_1 + x_2 = y_1 + y_2. \ \ \ \ \ (9)

Delete these four elements from {A} to create {A'}. Now we claim {A'} does not have a solution to (8) with {\ell = 2}. Indeed if there were, say

\displaystyle a_1 + a_2 = b_1 + b_2,

then adding this to (9) yields

\displaystyle a_1 + a_2 + x_1 + x_2 = b_1 + b_2 + y_1 + y_2,

which contradicts {N(A) = 0}. Thus we may apply the above argument to {A'} to obtain

\displaystyle T_4(A') \leq (1+o(1)) 4^5 |A'|^4,

and the Proposition follows from (4). {\spadesuit}

Exponential Sums over Small Subgroups

The purpose of the post is to recall a theorem of Bourgain and Konyagin that shows cancellation in exponential sums over multiplicative subgroups of {\mathbb{F}_p}, incorporating the point–plane bound incidence bound due to Rudnev. It is notoriously hard to find cancellation in short exponential sums in {\mathbb{F}_p}, for instance improving the Burgess bound is a fundamental open problem in number theory (see this previous blog post for discussion). Bourgain and Konyagin were able to leverage the sum–product phenomenon to show cancellation in certain sums with as few as {p^{\delta}} terms, improving upon the previous best of { p^{1/4+\epsilon}} due to Konyagin (incidentally the Burgess bound relies on a simpler sum–product type bound).

Let {H \leq \mathbb{F}_p^{\times}} be a multiplicative subgroup. We define the Fourier transform of {H} via

\displaystyle \widehat{1_H}(\xi) : = \frac{1}{p} \sum_{h \in H} e_p(-\xi h) , \ \ \ e_p(\theta) : = e^{2 \pi i \theta / p}. \ \ \ \ \ (1)

In (1) and what follows we adopt the notation of Chapter 4 in Tao and Vu (see also my previous blog post for some basic discussion on the discrete Fourier transform).

For any {\xi \in \mathbb{F}_p}, we have the trivial bound

\displaystyle |\widehat{1_H}(\xi)| \leq \mathbb{P}(H) : = |H| / p,

which is obtained at the zero frequency. On the other hand, {1_H} has multiplicative structure and we expect it cannot correlate with an additive character in light of the sum–product phenomenon. This was verified by Bourgain and Konyagin (see also these notes of Green).

Theorem 1 (Bourgain-Glibichuk-Konyagin): Let {H \leq \mathbb{F}_p^{\times}} of size at least {p^{\delta}}. Then

\displaystyle \sup_{\xi \in \mathbb{F}_p^{\times}}|\widehat{1_H}(\xi)| \leq \mathbb{P}(H) p^{-\epsilon}. \ \ \ \spadesuit

Theorem 1 should be compared to the famous Gauss sum estimate (see for instance this previous blog post), but applies to much smaller multiplicative subgroups. The proof relies on three ideas. The first is that if {|\widehat{1_H}(\xi) |} is large for one {\xi \in \mathbb{F}_p^{\times}}, then it is large for many {\xi \in \mathbb{F}_p^{\times}}. Indeed it follows from (1) that

\displaystyle \widehat{1_H}(h \xi) = \widehat{1_H}(\xi) , \ \ h \in H \ \ \ \ \ (2)

We define the spectrum (see Chapter 4 of Tao and Vu for detailed discussion, as well as these notes of Green) via

\displaystyle {\rm Spec}_{\alpha}(H) : = \{\xi \in \mathbb{F}_p : |\widehat{1_H}(\xi) | \geq \alpha \mathbb{P}(H) \}.

By Parseval’s identity of the form

\displaystyle \sum_{\xi \in \mathbb{F}_p} |\widehat{1_H}(\xi)|^2 = \mathbb{P}(H),

and (2), we find

\displaystyle |H| \leq | {\rm Spec}_{\alpha}(H)| \leq \mathbb{P}(H)^{-1} \alpha^{-2}.\ \ \ \ \ (3)

If {|H| = p^{\delta}}, this gives

\displaystyle \alpha \leq p^{1/2 - \delta},

which is only useful for {\delta > 1/2} (for instance this works quite well for Gauss sums). Thus we need new ideas to handle the case {\delta \leq 1/2}. Note this is in alignment with the principle that basic Fourier techniques intrinsically have a “square root barrier.”

The second idea we will use is that {H} has little additive structure in the following form. Recall the additive energy of {A} and {B} is defined via

\displaystyle E^+(A,B) = \{(a,a' , b , b' ) \in A^2 \times B^2 : a + b = a' + b' \}.

Note that

\displaystyle E^+(A,B) = \sum_x r_{A- B}(x)^2 , \ \ \ r_{A-B}(x) : = \#\{(a,b) \in A \times B : x = a -b \}.

Proposition 1 (Sum–Product): Let {H \leq \mathbb{F}_p^{\times}} and {|H|^2 |A| \leq p^2}. Then

\displaystyle E^+(H ,A) \leq |H| |A|^{3/2} \ \ \ \spadesuit

Proposition 1 should be compared to the trivial bound {E^+(H,A) \leq |H| |A|^2}.

Proof: We will use Rudnev’s point–plane incidence bound (Theorem 3 in this paper). To do so, we note that {E^+(H,A)} counts the number of solutions to

\displaystyle h + a = h' + a' , \ \ \ a , a' \in A , \ h, h' \in H.

Since {H} is a multiplicative subgroup, {|H|^2 E^+(H,A)} is the number of solutions to

\displaystyle hh_1 + a = h' h_1' + a' , \ \ \ a , a' \in A , \ h, h' , h_1 , h_1' \in H.

This is precisely the number of incidences between the point set {H \times H \times A} and planes of the form {h x + a = h' y + z}. Thus by Rudnev’s point–plane incidence bound of the form (note there is a condition on the number of maximum colinnear planes which is trivially satisfied in our cartesian product set–up)

\displaystyle I(P , \Pi) \ll n^{3/2} , \ \ \ |P| = |\Pi| = n,

we find

\displaystyle |H|^2 E^+(H,A) \ll |H|^3 |A|^{3/2} . \ \ \ \spadesuit

We now move onto the third idea and general principle that {{\rm Spec}_{\alpha}(H)} is additively structured. The following Lemma is due to Bourgain and can be found in Lemma 4.37 in Tao and Vu.

Lemma 1 (Additive Structure in Spectrum): Let {A \subset \mathbb{F}_p} and {0 < \alpha \leq 1}. Then for any {S \subset {\rm Spec}_{\alpha}(A)}, one has

\displaystyle \#\{ (\xi_1 , \xi_2) \in S \times S : \xi_1 - \xi_2 \in {\rm Spec}_{\alpha^2/2}(A) \} \geq \frac{\alpha^2}{2} |S|^2. \ \ \ \spadesuit

Lemma 1 roughly asserts that the spectrum is closed under addition. For example, consider the example {A = \{1 , \ldots , K\}} where {K = o(p)}. Here {{\rm Spec}_{\alpha}(A)} is an interval of length {\asymp \alpha^{-1}} (there are more sophisticated examples, see this paper of Green).

Proof of Lemma 1: We set {f = 1_A}. By assumption we have

\displaystyle \alpha \mathbb{P}(A) |S| \leq \sum_{\xi \in S} |\widehat{f}(\xi)| = \sum_{\xi \in S} c(\xi) \widehat{f}(\xi) = \frac{1}{p} \sum_{\xi \in S} \sum_{a \in A} c(\xi) e_p(\xi a),

for some {c(\xi) \in \mathbb{C}} of modulus 1. By changing the order of summation and Cauchy Schwarz,

\displaystyle \mathbb{P}(A)^2 \alpha^2 |S|^2 \leq \frac{|A|}{p^2} \sum_{a\in A} \sum_{\xi , \xi' \in S} c(\xi) c(\xi') e((\xi - \xi') a) \leq \mathbb{P}(A) \sum_{\xi , \xi' \in S} |\widehat{f}(\xi - \xi')|.

Lemma 1 follows from pigeon holing. {\spadesuit}

Suppose, for the sake of discussion, that for all {\alpha}, {S \subset {\rm Spec}_{\alpha}(A)} does not have additive structure in the strong form form {r_{S-S}(x) \ll 1} for all {x}. Then we conclude

\displaystyle \{ (\xi_1 , \xi_2) \in S \times S : \xi_1 - \xi_2 \in T \} \ll |T|,\ \ \ \ \ (4)

and so by Lemma 1 we have a significant growth from {{\rm Spec}_{\alpha}(A)} to {{\rm Spec}_{\alpha^2/2}(A)}. But then applying this again to {{\rm Spec}_{\alpha^2/2}(A)} we have significant growth to {{\rm Spec}_{\alpha^4/8}(A)} and repeating this procedure will eventually contradict the trivial bound

\displaystyle |{\rm Spec}_{\alpha}(A)| \leq p.

When {H} is a multiplicative subgroup, we can show a weaker version of (4) using that {{\rm Spec}_{\alpha}(H)} is a union of cosets of {H} via (2) and Proposition 1. We turn to the details.

Proof of Theorem 1: Fix {0 < \alpha \leq 1} be chosen small enough so that {S_{\alpha} : = {\rm spec}_{\alpha}(H)} is nonempty. By Lemma 1,

\displaystyle \#\{ (\xi_1 , \xi_2) \in S_{\alpha} \times S_{\alpha} : \xi_1 - \xi_2 \in {\rm Spec}_{\alpha^2/2}(H) \} \geq \frac{\alpha^2}{2} |S_{\alpha}|^2,

that is

\displaystyle \frac{\alpha^4}{4} |S_{\alpha}|^4\leq \left( \sum_{z \in {\rm Spec}_{\alpha^2/2}((A)(H) } r_{S_{\alpha} - S_{\alpha}}(z) \right)^2 \leq | {\rm Spec}_{\alpha^2/2}(H)| E^+(S_{\alpha} , S_{\alpha}).\ \ \ \ \ (5)

Now we use Proposition 1 to provide an upper bound for {E^+(S_{\alpha} , S_{\alpha})}. By (2), {S'_{\alpha} : = S_{\alpha} \setminus \{0\}} is a union of cosets of {H}, say {S'_{\alpha} = \cup_{x \in C} Hx}. Thus by the triangle inequality in {\ell^2} and Proposition 1,

\displaystyle E^+(S'_{\alpha} )^{1/2} \leq \sum_{x \in C} E^+(S'_{\alpha} , Hx)^{1/2} = \sum_{x \in C} E^+(S'_{\alpha}x^{-1} , H)^{1/2} \ll \frac{|S_{\alpha}|}{|H|}|H|^{1/2} |S'_{\alpha}|^{3/4}.

Combining with (5), we find

\displaystyle \alpha^4 |H| | {\rm Spec}_{\alpha}(H)|^{1/2} \ll | {\rm Spec}_{\alpha^2/2}(H)| .

By (3), we find that {|{\rm Spec}_{\alpha}(H)| > |H| + 1} and so

\displaystyle \alpha^4 p^{\delta / 2} \leq \alpha^4 |H|^{1/2} \ll \frac{| {\rm Spec}_{\alpha^2/2}(H)|}{|{\rm Spec}_{\alpha}(H)|} .\ \ \ \ \ (6)

Now we let {1 > \alpha_1 > \alpha_2 > \ldots > \alpha_{J+1} > 0}, where {\alpha_{i+1} = \alpha_i^2 / 2} and {\alpha_1 = p^{-\epsilon}}. Thus {{\rm Spec}_{\alpha_1}(H)} contains more than 0 or immediately conclude Theorem 1. Thus (6) holds for all {\alpha_i} since {{\rm Spec}_{\alpha_i}(H)} increases in size as {i} increases. We have

\displaystyle \prod_{i = 1}^{J} \frac{| {\rm Spec}_{\alpha_{i+1}}(H)|}{|{\rm Spec}_{\alpha_i}(H)|} = \frac{| {\rm Spec}_{\alpha_{J+1}}(H)|}{|{\rm Spec}_{\alpha_1}(H)|} \leq p ,

and so there is a {1 \leq j \leq J} such that

\displaystyle \frac{| {\rm Spec}_{\alpha_{j+1}}(H)|}{|{\rm Spec}_{\alpha_j}(H)|} \leq p^{1/J}.

Combining with (6), we find

\displaystyle p^{- 2^J \epsilon + \delta /2} \frac{1}{2^{J}} \ll \alpha_j^4 p^{\delta/2} \ll p^{1/J}.

Choosing {2^J \epsilon \asymp 1}, we find that

\displaystyle p^{\delta / 2} \ll \epsilon^{-1} p^{1 / \log \epsilon^{-1}},

which is a contradiction for {p} large as long as {\delta \ll \log^{-1} \epsilon^{-1}}. {\spadesuit}

As we saw in the proof, the sum–product phenomenon asserts that {H} has little additive structure which is in tension with the general property that {{\rm Spec}_{\alpha}(A)} is additively structured.

Entropy and Sumsets: An example

The following post is a result of a discussion with Imre Ruzsa. Motivated by the following easy inequality in additive combinatorics

\displaystyle A+2 \cdot A \subset A+A+A , \ \ q \cdot A := \{qa : a \in A\},

I asked if the following was true for a finitely valued random variable {X}:

\displaystyle H(X+2 \cdot X) \leq H(X+X+X), \ H(X) := -\sum_{x \in X} \mathbb{P}(X = x) \log_2 \mathbb{P}(X = x).\ \ \ \ \ (1)

Here all sums are of independent copies of the random variables. The idea is that one might expect {X+X} to be a bit more uniform than {2 \cdot X}.

First Imre provided a counterexample to the question

\displaystyle H(X+ 2\cdot Y) \leq H(X+Y+Y).

I find this example is particularly elegant. Let {X} be uniform on {\{0,1\}} and {Y} be uniform on {\{0 , \ldots , n\}}. Then {X+2 \cdot Y} is uniform on {\{0 , \ldots , 2n+1\}}, while the support of {X + Y + Y} is {\{0 , \ldots , 2n+1\}} but is not uniform (there is concentration in the middle thanks to the distribution of {Y+Y}).

We then seriously questioned the validity (1). After some discussion, Imre eventually said something about higher dimensional concentration that made me think one should check (1) for the “Gaussian.” The reason Gaussian is in quotes is that it is not finitely valued as assumed in (1), so strictly speaking we cannot check it for the Gaussian. To see if there was hope, I looked at the differential entropy of a real valued random variable {G} with density {p} defined via

\displaystyle H(G) := -\int_{-\infty}^{\infty} p(x) \log p(x) dx.

Let us take {G} to be the Gaussian with mean zero (this is irrelevant for entropy) and variance 1. Recall some basic properties of variance:

\displaystyle {\rm Var}(aG) = a^2 {\rm Var}(G) , \ \ {\rm Var}(G+G) = 2 {\rm Var}(G),

where {a \in \mathbb{R}} and {G+G} is understood to be the sum of two independent copies of {G}. Thus

\displaystyle {\rm Var}(G + 2 \cdot G) = 5 , \ \ {\rm Var}(G + G +G ) = 3.

So we indeed see that (1) is not true for the Gaussian. To construct a finitely valued random variable that does not satisfy (1), we can convolve a Bernoulli random variable with itself until (1) is not satisfied (assuming that going from discrete to continuous does not destroy (1) which is not obvious without checking as {2 \cdot X} has a strange support condition, for instance the same argument would prove H(2 \cdot G) \geq H(G+G) which is clearly not true for discrete random variables). Anyways, I wrote some quick python code to check this and found that for {X = B + B + B} where {B} is the random variable of a fair coin flip, we have

\displaystyle H(X+2 \cdot X) \approx 2.984 , \ \ H(X+X+X) \approx 2.623.

Here {X+ 2 \cdot X} and {X+X+X} are supported on {\{0 , \ldots , 9\}} and so their entropies are bounded by the entropy of uniform distribution on 10 elements which is

\displaystyle \log_2 10 \approx 3.322.

Sometimes entropy analogs of sumset inequalities hold and sometimes they do not (see this paper of Ruzsa or this paper of Tao, or a host of work by Madiman and coauthors).