# Quantitative Hilbert irreducibility and almost prime values of polynomial discriminants

Recently AndersonGafniLemke OliverLowry-DudaZhang and myself uploaded ”Quantitative Hilbert irreducibility and almost prime values of polynomial discriminants” to the arXiv. Let ${H}$ be a large integer,

$\displaystyle V_n(H) = \{x^n + a_{n-1} x^{n-1} + \cdots + a_0 : a_{n-1} , \ldots , a_0 \in \mathbb{Z} \cap [-H,H]\},$

and

$\displaystyle E_n(H) = \#\{f \in V_n(H) : {\rm Gal}(f) \neq S_n\}.$

A classical principle going back to at least Hilbert is that a typical polynomial should have Galois group ${S_n}$, and this would suggest ${\#E_n(H)}$ is significantly smaller than ${\#V_n(H)}$. Van der Waerden conjectured that ${E_n(H) = o(\#V_n(H) /H)}$. Van der Waerden’s conjecture was almost proved quite recently by Chow and Dietmann (for ${n \neq 7,8,10}$), except they are not able to estimate

$\displaystyle E_n(H , A_n) = \#\{f \in V_n(H) : {\rm Gal}(f) = A_n\}.$

Gallagher showed, using the large sieve, that

We show the following.

${\ }$ Theorem 1: Let ${H}$ be a positive integer. then

$\displaystyle \#E_n(H , A_n) = O\left(\frac{\#V_n(H) H^{O(1/n)}}{H^{2/3}} \right). \ \ \ \spadesuit$${\ }$

We note that Bhargava recently announced a proof of ${E_n(H , A_n) = O(\#V_n(H) / H)}$, which is stronger than Theorem 1. We remark that our techniques also apply to counting polynomials with almost prime discriminants (see Theorem 1.2 and the related 1.3).

We briefly mention some of the ideas used to prove Theorem 1. Classical algebraic number theory reveals that if the Galois group of a polynomial ${f}$ is ${A_n}$, then ${\mu_p(f) = (-1)^{n+1}}$, where ${\mu_p}$ is the Mobius function for ${\mathbb{F}_p[x]}$. Thus we can employ a sieve to estimate ${\#E_n(H , A_n) }$. However, instead of using the large sieve, we use the small sieve (more precisely, the Selberg sieve). Using this naively gives a bound similar to that of Gallagher in (1), as expected. The key advantage of using the small sieve is we can seek improvements if we have good distribution results. After employing Poisson summation our task in then to understand local factors, which turns out to be basically

$\displaystyle \widehat{\mu_p}(\xi_{n-1} , \ldots , \xi_0) =,\ \ \ \ \$

$\displaystyle = \frac{1}{p^{n}}\sum_{a_0 , \ldots , a_{n-1} \in \mathbb{F}_p} \mu(x^n + a_{n-1} x^{n-1} + \ldots + a_0) \psi_p(\xi_{n-1} a_{n-1} + \cdots + a_0 \xi_0),\ \ \ \ \ (2)$

where ${\psi_p : \mathbb{F}_p \rightarrow \mathbb{F}_p}$ is an additive character. This is morally the same problem as estimating  $\displaystyle \sum_{n \leq x} \mu(n) e(nx)$, where ${\mu}$ is the classical Mobius function. Baker and Harman showed that GRH implies that for any ${\epsilon > 0}$

$\displaystyle |\sum_{n \leq x} \mu(n) e(nx)| \ll x^{3/4 + \epsilon}.$

As the Riemann hypothesis is known in the function field setting, such a bound should also hold for (2). Indeed Porritt showed this is the case and this allows us to establish Theorem 1.

# Effective results on the size and structure of sumsets

For a set ${A \subset \mathbb{Z}^d}$ we define the ${N}$-fold sumset of ${A}$ via

$\displaystyle NA : = \{a_1 + \ldots + a_N : a_i \in A\}.$

We have the following result of Khovanskii from 1992.

Theorem 1 (Polynomial size of iterated sumsets): Let ${A \subset \mathbb{Z}^d}$ of size ${n}$. Then there exists a ${N_A \geq 1}$ and ${c_0 , \ldots , c_d \in \mathbb{Q}}$ such that for ${N \geq N_A}$

$\displaystyle |NA| = \sum_{j=0}^d c_j N^j. \ \ \ \spadesuit$

One part of a recent work, Andrew Granville, Aled Walker, and I were able to provide effective bounds for ${N_A}$. We let

$\displaystyle w(A) = \max_{a , b \in A} ||a-b||_{\infty}.$

Theorem 2 (Effective Khovanskii): Let ${A \subset \mathbb{Z}^d}$ of size ${n}$. Then there exists ${c_0 , \ldots , c_d \in \mathbb{Q}}$ such that for any

$\displaystyle N \geq 2^{2n} n^{n^3 + 1} w(A)^{n^3}.\ \ \ \ \ (1)$

one has

$\displaystyle |NA| = \sum_{j=0}^d c_j N^j. \ \ \ \spadesuit$

It is not clear that (1) is best possible. Certainly there has to be some dependence on how ${A}$ lies in space, as can be seen by taking the elements of ${A}$ to be increasingly spread out. Curran and Goldmakher showed that when ${|A| = d+2}$, one needs ${N_A}$ at least the order of ${ \omega(A)^d }$

We first recall an elegant proof of Theorem 1 due to Nathanson and Ruzsa, which is also the starting point for Theorem 2.

Proof of Theorem 1: We consider ${\mathbb{Z}_{\geq 0}^n}$, equipped with ${\leq}$, the lexicographical ordering. Let ${A = \{a_1 ,\ldots , a_n\}}$. We then have a map

$\displaystyle \phi : \mathbb{Z}_{\geq 0}^n \rightarrow \bigcup_{j=0}^{\infty} jA,$

via

$\displaystyle \phi(x_1 , \ldots , x_n) =x_1 a_1 + \ldots + x_n a_n.$

It is worth noting that if ${\phi}$ is injective, we immediately deduce Theorem 1 by the stars and bars. Typically ${\phi}$ is not injective, but it turns out to not be a significant problem. We let ${U}$ be the set of elements ${x}$ such that there exists a ${y}$ with ${||y||_1 = ||x||_1}$, ${y < x}$, and ${\phi(y) = \phi(x)}$. We call any element of ${U}$ useless. They are deemed useless as

$\displaystyle |NA| = \{x \in \mathbb{Z}_{\geq 0}^n : ||x||_1 = N\} \setminus U.$

There is nothing intrinsic that makes elements of ${U}$ useless, rather it is a consequence of our choice of lexicographical ordering. One can check that ${U}$ is closed under translations from elements of ${\mathbb{Z}_{\geq 0}^n}$.

We need a way to count the elements of ${U}$ and thus propose another definition. We let ${\leq_{{\rm unif}}}$ be the partial ordering where ${x \leq_{{\rm unif}} y}$ if ${x}$ is smaller than ${y}$ coordinate-wise. Let ${U_{\min}}$ be the elements ${x \in U}$ such that there is no ${y\in U}$ with ${y <_{{\rm unif}} x}$. Dickson’s lemma implies that ${U_{\min}}$ is finite. For a set ${U'}$, we let

$\displaystyle B(N, U') = \{x \in \mathbb{Z}_{\geq 0 }^n: ||x||_1 = N , x >_{{\rm unif}} u, \ \text{for all} \ u \in U'\}.$

Thus we have, by the inclusion-exclusion principle,

$\displaystyle | \{x \in \mathbb{Z}_{\geq 0}^n : ||x||_1 = N\} \setminus U| = \sum_{U' \subset U_{\min}} (-1)^{|U'|}| B(N , U')|.$

Thus it is enough to show for any finite ${U'}$, ${\#B(N,U')}$ is polynomial in ${N}$, for ${N}$ large enough. This follows from the same stars and bars argument mentioned above, as long as

$\displaystyle N \geq \max_{u \in U'} ||u||_{\infty}, \ \ \ \ \ (2)$

and Theorem 1 follows. ${\spadesuit}$

Note that this proof does not give any effective bound on ${N_A}$, as we do not have any control over the set ${U_{\min}}$. In particular, one would like to have a bound on the ${\ell^{\infty}}$ norm of the elements of ${U_{\min}}$, in light of (2). In general, one cannot make Dickson’s lemma quantitative, but in our case we can use the structure of ${U_{\min}}$ to do so. The point is that ${U}$ is defined by linear relations, so one can appeal to Siegel’s lemma.

Proof (sketch) of Theorem 2: We translate ${A}$ so that ${0 \in A}$, which does not effect the problem (in particular, ${w(A)}$ remains unchanged). We build upon the proof of Theorem 1. Suppose ${x \in U_{\min}}$. As ${x \in U}$, there is a ${y \in \mathbb{Z}_{\geq 0}^n}$ such that ${||x||_1 = ||y||_1}$, ${y < x}$, and ${\phi(x) = \phi(y)}$. Thus

$\displaystyle \sum_{i \in I} x_i a_i = \sum_{j \in J} y_i a_j.$

As ${x \in U_{\min}}$, one can check that ${I\cap J = \emptyset}$. We now construct a matrix ${M}$ as follows. The first row has ${\#I}$ 1’s followed by ${\#J}$ ${-1}$‘s. The remaining ${d}$ rows are give by ${(a_i)_{i \in I}}$ and ${(-a_j)_{j \in J}}$. Thus

$\displaystyle M (x,y)^T = 0 \ \ \ \ \ (3)$

One would hope to apply Siegel’s lemma, which asserts that (3) has a solution such that ${||(x,y)||_{\infty}}$ is small. The primary issue is that this small solution, ${x^*}$ may have nothing to do with ${(x,y)^T}$. However one can translate ${(x,y)^T}$ by a multiple of ${x^*}$ to create a solution that is small in a single coordinate. Then one “forgets” this coordinate and tries proceeds by induction. A secondary issue is that the ${x^* \in \mathbb{Z}^n}$, may have negative coordinates, but this turns out to not be a critical issue. All of this carried out in section 6 of the aforementioned paper with Granville and Walker. ${\spadesuit}$

# Kuratowski Closure and Topological Spaces

This post was the result of conversations with Chris Gartland (from several years ago). In what follows we outline an alternate definition of a topological space due to Kuratowski.

We recall Kuratowski’s closure axioms for a set ${X}$. These are a set of axioms which can be used to define a topology on a set. Here, one replaces the notion of open (or closed) sets with a closure operation. The closure operation has the advantage of being defined on all subsets of ${X}$. For${A \subset X}$, one may think of the closure of ${A}$, denoted ${\overline{A}}$, to be points that are “close to” ${A}$.

First we recall what is meant by a topological space. We let ${\mathcal{P}(X)}$ be the power set of ${X}$, that is the set of all subsets of ${X}$.

${\ }$ Definition 1: Let ${X}$ be a set and ${T \subset \mathcal{P}(X)}$. Then ${(X,\mathcal{T})}$ is a topology with open sets ${\mathcal{T}}$ if

• (a) ${\emptyset , X \in \mathcal{T}}$,
• (b) if ${A, B \in \mathcal{T}}$, then ${A \cap B \in \mathcal{T}}$,
• (c) if ${A_i \in \mathcal{T}}$ for all ${i \in \mathcal{I}}$, then ${\cup_{i\in \mathcal{I}} A_i \in \mathcal{T}}$.

Furthermore, we say

$\displaystyle \mathcal{C} := \{X \setminus A : A \in \mathcal{T}\},$

are the closed sets of ${X}$. ${\spadesuit}$ ${\ }$

Note we could have defined (a), (b), (c) in terms of closed sets, utilizing DeMorgan’s law. We make use of this implicitly below. We now present the definition of Kuratowski closure.

${\ }$ Definition 2: Let ${X}$ be a set and ${\overline{\cdot} : \mathcal{P}(X) \rightarrow \mathcal{P}(X)}$. We say ${\overline{\cdot}}$ is a closure operation if

• (i) ${\overline{\emptyset} = \emptyset}$,
• (ii) if ${A \subset X}$, then ${A \subset \overline{A}}$,
• (iii) if ${D = \overline{A}}$, then ${\overline{D} = D}$,
• (iv) if ${A,B \subset X}$, then ${\overline{A\cup B} = \overline{A} \cup \overline{B}}$. ${\spadesuit}$

${\ }$

As mentioned above, one can informally think of ${\overline{A}}$ as the set of points in ${X}$ which are “close” to ${A}$. We now explain that these two definitions are equivalent. To define a closure operation from Definition 1, we set

$\displaystyle \overline{A} := \bigcap_{A \subset C \ \text{closed}} C.\ \ \ \ \ (1)$

Note that ${\overline{A}}$ is closed by (c). Given a closure operation, we may define the closed sets to be

$\displaystyle \mathcal{C} : = \{\overline{A} : A \subset X\}.\ \ \ \ \ (2)$

${\ }$ Proposition 1: Let ${(X, \mathcal{T})}$ be a topological space as in Definition 1. Then the operation defined in (1) is a closure operation in the sense of Definition 2. ${\spadesuit}$ ${\ }$

Proof: Let

$\displaystyle \mathcal{C} = \{X \setminus A : A \in \mathcal{T}\},$

be the set of closed sets in ${X}$. Let ${A , B \subset X}$. By (1), we have ${A \subset \overline{A}}$ and thus (ii) holds. As ${X}$ is open, ${\emptyset \in \mathcal{C}}$ and so (i) is satisfied. Note for any closed set, ${C \in \mathcal{C}}$, we have ${\overline{C} = C}$ by (ii) and that ${C}$ itself appears on the right hand side of (1). By (c), we have that

$\displaystyle D = \bigcap_{A \subset C \ \text{closed}} C \in \mathcal{C},$

and so ${\overline{D} = D}$. This proves (iii) and so it remains to show (iv). Note that ${\overline{A} \cup \overline{B}}$ is a closed set containing ${A \cup B}$, by (b). Thus

$\displaystyle \overline{A \cup B} \subset \overline{A} \cup \overline{B}.\ \ \ \ \ (3)$

Now let ${C \in \mathcal{C}}$ that contains ${A \cup B}$. Then ${A, B \subset C}$ and so

$\displaystyle A , B \subset \ \bigcap_{(A\cup B) \subset C \ \text{closed}} C.$

This proves the reverse inclusion of (3) and thus of Proposition 1. ${\spadesuit}$

We now show that a closure axiom can be used to define a topology.

${\ }$ Proposition 2: Let ${\overline{\cdot} : \mathcal{P}(X) \rightarrow \mathcal{P}(X)}$ be a closure axiom as in Definition 2. Then the sets defined in (2) form the closed sets of a topology in the sense of Definition 1. ${\spadesuit}$ ${\ }$

Proof: We implicitly make use of DeMorgan’s law to transition bewtween closed and open sets. By ${(i)}$, we have ${\emptyset \in \mathcal{C}}$ and by (ii), we have ${X \in \mathcal{C}}$. Thus ${(a)}$ is established. Let ${A, B \in \mathcal{C}}$. By (iii), (iv) and (2), we have

$\displaystyle A \cup B = \overline{A \cup B} \in \mathcal{C},$

which establishes (b). Let ${C_i \in \mathcal{C}}$ for all ${i \in \mathcal{I}}$ some index set. Then we have

$\displaystyle \overline{\cap_{i \in \mathcal{I}} C_i}= \cap_{i \in \mathcal{I}} C_i.$

Indeed, the backwards inclusion follows from (ii). To see the forward direction, it is enough to show for any ${i \in \mathcal{I}}$,

$\displaystyle \overline{\cap_{i \in \mathcal{I}} C_i} \subset \overline{C_i} = C_i.$

The second equality is (iii). The first subset inequality follows from (iv) as if ${A \subset B}$ then

$\displaystyle \overline{A} \subset \overline{A} \cup \overline{B} = \overline{A \cup B} = \overline{B}.$

We apply this with ${A = \cup_{i\in \mathcal{I}} C_i}$ and ${B = C_i}$. ${\spadesuit}$

Recall if ${(X , \mathcal{T}_X)}$ and ${(Y , \mathcal{T}_Y)}$ are topological spaces, then

$\displaystyle f : X \rightarrow Y,$

is said to be continuous if the pre-image of every open set in ${Y}$ is open in ${X}$, that is

$\displaystyle f^{-1}(B) \in \mathcal{T}_X,\ \ \ \ \ (4)$

for all open ${Y}$. As ${f^{-1}(Y \setminus B) = X \setminus f^{-1}(B)}$, (5) is equivalent to the analogous definition for closed sets. It turns out that we may equivalently define a map to be continuous if

$\displaystyle f(\overline{A}) \subset \overline{f(A)}.\ \ \ \ \ (5)$

This can be informally interpreted as points that are close to ${A}$ are mapped to points that are close to ${f(A)}$. We now prove this.

${\ }$ Proposition 3: Let ${f : X \rightarrow Y}$ be a function. Then ${f}$ is continuous in the sense of (4) if and only if it is continuous in the sense of (5). ${\spadesuit}$ ${\ }$

Proof: We start with the forward direction. Let ${A \subset X}$. Then by (4), ${f^{-1}(\overline{f(A)})}$ is closed. As ${f(A) \subset \overline{f(A)}}$, we have

$\displaystyle A \subset f^{-1}(\overline{f(A)}).$

Since the right hand side is closed, by (ii) and (iii), we have

$\displaystyle \overline{A} \subset f^{-1}(\overline{f(A)}).$

Applying ${f}$ to both sides establishes (5).

Now we show the reverse implication. Let ${B}$ be a closed set. Then by (5),

$\displaystyle f(\overline{f^{-1}(B)}) \subset \overline{f(f^{-1}(B))} =\overline{B} = B .$

As any element which is mapped to ${B}$ by ${f}$ must lie in ${f^{-1}(B)}$, we have

$\displaystyle \overline{f^{-1}(B)} \subset f^{-1}(B).$

By (ii), equality holds and thus ${f^{-1}(B)}$ is closed. ${\spadesuit}$

# Exponential Sums along Oblique Lines

Thanks to Changhao Chen, Burak Erdoğan, and Igor Shparlinski for useful discussions surrounding this post.

Let ${k}$ be a positive integer (which we take later to be ${\geq 11}$). We consider the exponential sum

$\displaystyle S(x,y) : = \sum_{n=N}^{2N} e(xn + yn^k),\ \ \ \ \ (1)$

where ${e(x) : = e^{2\pi i x}}$. We are interested in bounds for

$\displaystyle \sup_{(x,y) \in \mathcal{L}_z} |S(x,y)|,$

where ${\mathcal{L}_z \subset [0,1)^2}$ are a family of sets indexed by some parameter ${z\in \mathbb{R}}$. For simplicity, we only consider, for fixed ${(a,b) \in \mathbb{R}}$,

$\displaystyle \mathcal{L}_z = \{(x,y) \in [0,1)^2 : ax + by = z\}.$

We would like to show that for most ${z}$, (1) is small. This is supported by the usual heuristic that we expect square root cancellation in (1). On the other hand, ${S(x,y)}$ is large for some special values of ${(x,y)}$ (say ${(0,0)}$), so it certainly matters how the ${\mathcal{L}_z}$ lie in space. Burak Erdoğan and I studied these types of questions (also the topic of this previous blog post) motivated from understanding the fractal dimension of solutions to certain PDE (see the introduction and references within for a complete history). We let

$\displaystyle s(k) = k(k-1) + 2 \lfloor \sqrt{2k-2}\rfloor - 2.$

${\ }$ Theorem 1 (Chen-Shparlinski): Let ${\epsilon > 0}$ and

$\displaystyle \alpha > 1 - \frac{1}{1 + s(k)}.$

Then for a.e. ${z \in \mathbb{R}}$, with respect to the Lebesgue measure,

$\displaystyle \sup_{(x,y) \in \mathcal{L}_z} S(x,y) \ll_z N^{\alpha + \epsilon}, \ \ \ \spadesuit$

${\ }$

Let us make some remarks before beginning the proof. It is worth noting that for very small ${k}$ we know the best value of ${\alpha}$. Indeed, Brandes, Parsell, Poulias, Vaughan, and I showed ${\alpha = 3/4}$ is admissible and cannot be improved. In the aforementioned paper of Erdoğan, we mentioned that one could obtain a variant of Theorem 1 by invoking Vinogradov’s mean value theorem, which is a best possible bound for

$\displaystyle \int_{[0,1]^k} |\sum_{n=N}^{2N} e(x_1 n + x_2 n^2 + \cdots + x_k n^k)|^{p}.\ \ \ \ \ (2)$

The guiding principle is that if an exponential sum is large at a single point, then one can create many other points where the exponential sum is large. On the other hand, there cannot be too many points where this occur as (2) is small. This is a somewhat unsatisfactory approach, as it is not clear that the ${k}$ variable mean value in (2) is the right tool to analyze the two variable ${S(x,y)}$. Recently, Chen and Shparlinski instead utilized the following two variable mean value type theorem of Wooley, which turns out to improve the bounds a bit and simplify the proof.

${\ }$ Theorem 2 (Wooley): Suppose ${k \geq 11}$ is an integer. Then for any ${\sigma \geq s(k)}$

$\displaystyle \int_0^1 \int_0^1 |S(x,y)|^{\sigma} dx dy \leq N^{\sigma - k - 1 + o(1)}. \ \ \spadesuit$

${\ }$

Note that Theorem 2 is best possible, in a certain sense. By considering a small ${N^{-1} \times N^{-k}}$ rectangle near ${(0,0)}$, we see

$\displaystyle \int_0^1 \int_0^1 |S(x,y)|^{\sigma} dx dy \gg N^{\sigma - k - 1}.$

Thus Theorem 2 cannot be improved much, for the values of ${\sigma}$ for which it applies. It is not clear that the range of ${\sigma}$ is best possible. A natural conjecture is that Theorem 2 holds for

$\displaystyle \sigma > 2(k+1) .$

Such an improvement would improve Theorem 1.

Proof of Theorem 1: We only prove for ${N=2^j}$, for simplicity. The reader may consult Chen and Shparlinski’s paper for the general case, where the additional idea is to employ the completion technique.

Let ${\epsilon > 0}$ and ${0 < \alpha < 1}$. We partition ${[0,1)^2}$ into a grid of ${O(N^{2 \alpha - k - 3 - 2 \epsilon})}$ small rectangles of size approximately

$\displaystyle N^{\alpha - 2 - \epsilon} \times N^{\alpha - k - 1 - \epsilon}.$

We label these rectangles by

$\displaystyle \cup_{R \in \mathcal{R}} R.$

The point is that (1) is does not change much on such rectangles. Indeed it is easy to check, using ${e(x) = 1 + O(x)}$, that (for ${N}$ large enough) if

$\displaystyle |S(x,y)| \geq N^{\alpha},$

for some ${(x,y) \in R}$, then

$\displaystyle |S(x',y')| \geq N^{\alpha}/2,$

for any ${(x',y') \in R}$. We let ${\mathcal{R}_{\alpha} \subset \mathcal{R}}$ consist of the rectangles ${R}$ such that there is a ${(x,y) \in \mathcal{R}}$ with ${|S(x,y)| \geq N^{\alpha}}$. Combining this with the mean value estimate in Theorem 2, we see that ${\#\mathcal{R}_{\alpha}}$ cannot be too large.

Indeed, Markov’s inequality and Theorem 2, we see that for ${\sigma \geq s(k)}$,

$\displaystyle N^{2 \alpha -2 -k - 1 - 2 \epsilon} N^{\alpha \sigma} \#\mathcal{R}_{\alpha} \leq \int_0^1 \int_0^1 |S(x,y)|^{\sigma} dx dy \leq N^{\sigma - k - 1 + o(1)}.$

Thus

$\displaystyle \#\mathcal{R}_{\alpha} \leq N^{(1-\alpha)\sigma -2\alpha+2 + 2 \epsilon + o(1)}.\ \ \ \ \ (3)$

We now consider the image of these rectangles under the map

$\displaystyle (x,y) \mapsto ax + by.$

We have

$\displaystyle \{z\in \mathbb{R} : |S(x,y)| \geq N^{\alpha}, \ \ \text{for some} \ (x,y) \in \mathcal{L}_z\} \subset f\left(\bigcup_{R \in \mathcal{R}_{\alpha}}R\right) = \bigcup_{R \in \mathcal{R}_{\alpha}}f(R),$

where

$\displaystyle f(x,y) = ax+by.$

Note that ${f}$ does not distort rectangles too much, so that

$\displaystyle \lambda(f(R)) \ll_{a,b} N^{\alpha - 2 + \epsilon}.$

where ${\lambda}$ is the Lebesgue measure. Thus by subadditivity of the Lebesgue measure and (3)

$\displaystyle \lambda(\{z\in \mathbb{R} : |S(x,y)| \geq N^{\alpha}, \ \ \text{for some} \ (x,y) \in \mathcal{L}_z\} )\ll_{a,b} N^{\alpha - 2 +\epsilon} N^{(1-\alpha)\sigma -2\alpha+2 + 2 \epsilon + o(1)}.$

Note here that ${N}$ is fixed. What we actually care about is what happens for a fixed ${z}$ and ${N \geq N(z)}$ for some large ${N(z)}$. There is a standard trick from probability (or analysis) to apply the Borel-Cantelli lemma. We first apply the above result with ${N = 2^j}$ to find

$\displaystyle \lambda(\{z\in \mathbb{R} : |S(x,y)| \geq 2^{j\alpha}, \ \ \text{for some} \ (x,y) \in \mathcal{L}_z\} ) \leq 2^{j((1-\alpha)\sigma - \alpha + 2 \epsilon )}.\ \ \ \ \ (4)$

By the Borel-Cantelli lemma, if

$\displaystyle \sum_{j=1}^{\infty} 2^{j((1-\alpha)\sigma - \alpha + 2 \epsilon )}< \infty,$

then the set of ${z}$ such that (4) holds for infinitely many ${j}$ has measure zero. This is implied by

$\displaystyle (1-\alpha)\sigma - \alpha < 0,$

as long as ${\epsilon}$ is sufficiently small. This, in turn, is implied by

$\displaystyle \alpha > \frac{\sigma}{\sigma + 1} = 1 - \frac{1}{\sigma + 1} . \ \ \ \spadesuit$

# A large gap in a dilate of a set

I recently uploaded “a large gap in a dilate of a set,” to the arXiv. We prove the following.

${\ }$ Theorem 1: Let ${A \subset \mathbb{F}_p}$ with at least two elements. Suppose ${N \leq 2p/|A| -2}$. Then there is a ${x \in \mathbb{F}_p}$ and ${d \in \mathbb{F}_p^{\times}}$ such that

$\displaystyle (d\cdot A + x ) \cap \{1 , \ldots , N\} = \emptyset. \ \ \ \spadesuit$

${\ }$

As the note is only 3 pages, we do not remark on the proof (which uses the polynomial method) but elaborate on some related topics. Note by the pigeon-hole principle, Theorem 1 is true for every ${d}$ if we only insist ${N \geq p/|A| - 1}$. Thus we have effectively doubled the bound coming from the pigeon hole principle. Note without dilation, Theorem 1 is false, as one can take ${|A|}$ equally spaced elements.

One can ask what happens for Theorem 1 if one does not allow translation by ${x}$. It turns out that one cannot hope to go beyond ${N \geq 2p/|A|}$, as it was shown in this paper of Chen, Shparlinski and Winterhof, using the example

$\displaystyle A = \{1 , \ldots , p/N\} \cup -\{1 , \ldots , p/N\} .$

It is an interesting question to decide to what extent Theorem 1 is true with translation by ${x}$. We remark this is in a similar spirit to the lonely runner conjecture.

It would be nice if Theorem 1 were true for ${N \geq C p/|A|}$ for all ${C}$, even in the special case when ${|A| \sim \sqrt{p}}$. Certainly this is true for a random set, without the need for dilation.

In particular, this would give us hope in answering an old question of Erdös, which we recall. A Sidon set in a abelian group is a set such that ${a+b = c+d}$ with ${a,b,c,d \in A}$ implies ${\{a,b\} = \{c,d\}}$. Let ${r_2(N)}$ be the largest Sidon set contained in ${\{1 , \ldots , N\}}$. Erdös asked if

$\displaystyle r_2(N) = N^{1/2} + O(1).$

There are constructions of Sidon sets of size ${N^{1/2}}$ (for some ${N}$) coming from ${ \mathbb{Z} /N \mathbb{Z}}$ for well-chosen ${N}$. The hope would be to dilate the set in ${\mathbb{Z}/N \mathbb{Z}}$ so there is a large gap of size ${g}$, thus finding a Sidon set of inside of ${\{1 , \ldots , N-g\}}$ in place of ${\{1 , \ldots , N\}}$. It is actually not known if we can take ${N }$ to be a prime in the modular construction, which may be found in this nice survey of O’ Bryant. This is certainly a question of interest.

On the other hand, one can hope to improve Theorem 1 for some of these constructions. It turns out one can easily check that Ruzsa’s construction which is the set ${A \subset \mathbb{Z}/ (p(p-1)) \mathbb{Z}}$ does not admit large gaps. Indeed the set has size ${\sim p}$ but any dilate of ${A}$ does not contain a gap significantly longer than ${2p}$. This also shows Theorem 1 is false for general cyclic groups. The point is that in his construction the natural projection to ${\mathbb{Z}/(p-1)\mathbb{Z}}$ maps ${A}$ surjectively.

This seems to be a bit of a red herring in the application to Sidon sets. On the other hand, for the well-known construction of Bose-Chowla (contained in the aforementioned survey), the analog of Theorem 1 is true and there is no reason to suspect that it cannot be improved. In fact, in this case a proof also proceeds by the polynomial method.

# An Analytic Approach to the Cardinality of Sumsets

Dávid Matolcsi, Imre RuzsaDmitrii Zhelezov and I just uploaded our paper, “An analytic approach to cardinalities of sumsets” to the arXiv. Alongside Ben Green, we also uploaded a follow up paper, “A weighted Prékopa-Leindler inequality and sumsets with quasi-cubes.”

Our focus is sumsets of finite subsets of ${\mathbb{Z}^d}$. For instance, if ${A \subset \mathbb{Z}}$ and ${d}$ is a positive integer, we have

$\displaystyle |A^d + A^d| =|A+A|^d.$

If ${A}$ is not an arithmetic progression, it is known that

$\displaystyle |A+A| \geq 2|A|,$

and so we obtain

$\displaystyle |A^d + A^d| \geq 2^d |A|.\ \ \ \ \ (1)$

It is natural to look to obtain analogs of (1) for more general subsets of ${\mathbb{Z}^d}$. For instance, the Brunn-Minkowski inequality implies the continuous analog,

$\displaystyle \mu(X+X) \geq 2^d \mu(X),$

whenever ${X}$ is a compact subset of ${\mathbb{R}^d}$. In the discrete setting such a result is not true. First of all, the notion of cardinality does not distinguish dimension. Thus we can take a one dimensional arithmetic progression and place in ${\mathbb{Z}^d}$, which will not obtain the growth in (1). For a legitimate ${d}$ dimensional set, one can take an arithmetic progression alongside ${d-1}$ random points and ${|A+A| / |A|}$ will grow only linearly in ${d}$. There are some situations when we can establish (1). For instance, if ${A}$ contains ${\{0,1\}^d}$, then Green and Tao showed

$\displaystyle |A+A| \geq 2^{d/2} |A|.$

Freiman (see chapter 5 of Tao and Vu) also showed if ${A \subset \mathbb{Z}^d}$ such that every hyperplane intersects ${A}$ in ${\ll_{d, \epsilon} |A|}$ points, then

$\displaystyle |A+A| \geq (2^d - \epsilon) |A|.$

We also mention the work of Gardner and Gronchi, who prove inequalities for general ${d}$-dimensional sets. The drawback there is that the extremal examples are nearly one dimensional, and particular they only derive growth that is linear in the dimension. We provide a result in a similar spirt to Green and Tao. To state our result, we need a definition. We define a quasi-cube inductively (on the dimension). Any two point set is a 1-dimensional quasi-cube. A ${d}$ dimensional set ${U}$ is a quasi-cube if ${U = U_0 \cup U_1}$ where ${U_0 ,U_1}$, where ${U_j = (x_j + L_j) \cap U}$, with ${x_j \in \mathbb{Z}^d},$ ${L_j}$ a hyperplane and ${U_0,U_1}$ are themselves quasi-cubes. A cube is a quasi-cube. Also, the trapezoid:

$\displaystyle U = \{(0,0) , (1,0), (0,1) , (1,x)\} , \ \ \ \text{with} \ \ x \neq 0,$

is a quasi-cube.

${\ }$ Theorem 1: Let ${A \subset \mathbb{Z}^d}$ be finite. Suppose that $A$ contains $U$ which a subset of a quasi-cube. Then

$\displaystyle |A+A| \geq |U|^{1/2} |A|, \ \ \ \spadesuit.$${\ }$

This has applications to the sum-product problem via the Bourgain-Chang argument and will be explored in a future paper of Pálvölgyi and Zhelezov. We discuss some of the ideas. As mentioned above, Theorem 1 can be thought of as a discrete analog of the Brunn-Minkowski inequality. The Brunn-Minksowski inequality can be proved using the Prékopa-Leindler inequality, and this is the viewpoint we adopt.

To start, for ${A,B \subset \mathbb{Z}}$, we have

$\displaystyle |A+B| \geq |A| + |B| - 1,\ \ \ \ \ (2)$

and for ${X,Y \subset \mathbb{R}^d}$ are compact, we have

$\displaystyle \mu(X+Y) \geq \mu(X) + \mu(Y).\ \ \ \ \ (3)$

Both can be proved in the same way: by finding a translate of ${X}$ and a translate of ${Y}$ in ${X+Y}$ that are almost disjoint. In the continuous setting (3) is used to establish a functional analog. For compactly support and bounded ${f,g:\mathbb{R} \rightarrow \mathbb{R}_{\geq 0}}$ we define

$\displaystyle f \mathbin{\overline *} g(z) : = \sup_{x+y =z} f(x) g(y).$

Then the Prékopa-Leindler inequality implies that

$\displaystyle \int f \mathbin{\overline *} g \geq ||f||_2 ||g||_2.\ \ \ \ \ (4)$

When ${f=1_X}$ and ${g=1_Y}$, we obtain a weaker variant of (3):

$\displaystyle \mu(X+Y) \geq 2 \mu(X)^{1/2} \mu(Y)^{1/2}.$

To go in the other direction, one applies (3) to the level sets of ${f}$ and ${g}$Gardner’s survey provides more accurate implications along these lines. Thus it is natural to ask for a functional analog of (2). Indeed, we let ${a,b: \mathbb{Z} \rightarrow \mathbb{R}_{\geq 0}}$ be compactly supported. Then Prékopa showed that

$\displaystyle ||a \mathbin{\overline *} b|| \geq 2 ||a||_2 ||b||_2 - 1.\ \ \ \ \ (5)$

We invite the reader to try to discover a proof, it is rather non-trivial! In any case, the next step in Prékopa-Leindler is to tensorize, as is explained in this blog post of Tao or section 4 of the aforementioned survey of Gardner. The point is the integral inequality (4) can be obtained by induction on the dimension by applying the lower dimensional analog of (4) to functions such as ${f(x_1 , \ldots , x_{d-1} , x_d)}$ with ${x_d}$ fixed. Doing this, one obtains, for compactly support and bounded ${f,g:\mathbb{R} \rightarrow \mathbb{R}_{\geq 0}}$

$\displaystyle \int f \mathbin{\overline *} g \geq ||f||_2 ||g||_2.$

A slight generalization of this inequality quickly implies the Brunn-Minkowski inequality. In the discrete setting, the ${-1}$ present in both (2) and (5) are quite a nuisance, particularly in the tensorization step. To get around this, we observe that

$\displaystyle |A+B+U| \geq |A|+|B| \geq 2 |A|^{1/2} |B|^{1/2},$

${U \subset \mathbb{Z}}$ of size 2. It turns out one has

$\displaystyle ||a \mathbin{\overline *} b \mathbin{\overline *} 1_U|| \geq 2 ||a||_2 ||b||_2 ,$

as was originally observed by Prékopa himself. One has an easier time tensorizing this, and following this one can obtain

$\displaystyle |A+B+\{0,1\}^d| \geq |A|^{1/2} |B|^{1/2}.$

In our work, we take an abstract approach, defining

$\displaystyle \beta(U) = \inf_{A,B \neq \emptyset} \frac{|A+B+U|}{|A|^{1/2} |B|^{1/2}}.$

Note that ${\beta(U) \leq |U+U+U|/|U|}$. Indeed, ${\beta}$ is intended to be a replacement of the usual notion of the doubling constant, ${|U+U|/|U|}$. It turns out for certain large dimensional sets, we can accurately estimate ${\beta}$. For instance, we show that if ${U}$ is a subset of a quasi-cube then

$\displaystyle \beta(U) = |U|,$

which quickly implies Theorem 1. The upper bound follows immediately from the definition of ${\beta}$, it is the lower bound that takes some work. To do this, we show that this is the same as a functional analog and that tensorization occurs in general. We also explore general properties (for instance ${\beta}$ is independent of the ambient group) and present a bunch of conjectures (which may be interpreted as things we were unable to prove or disprove).

The above outline nearly works for subsets of quasi-cubes, though it turns out one needs a stronger 1 dimensional inequality of the form

$\displaystyle ||f \mathbin{\overline *} g \mathbin{\overline *} h_{\delta}||_1 \geq (1+\delta) ||f||_2 ||g||_2,$

where ${h = (1,\delta)}$. The cases ${\delta = 0,1}$ were already known but for other values of ${\delta}$ it is tricker. There are now two proofs: in the original paper we use a variational argument while in the follow up paper we derive it from the Prékopa-Leindler inequality.

This is a sort of “tensorization plus two point inequality argument,” which is present in other works (i.e. Beckner’s inequality).

# The Frobenius Postage Stamp Problem, and Beyond

Andrew Granville and I just uploaded, “The Frobenius postage stamp problem, and beyond” to the arXiv.

Let ${A \subset \mathbb{Z}}$ with ${\min A = 0}$, ${\max A = b}$, and who has greatest common divisor of all the elements is 1. For positive integer ${N}$, we are interested in the structure of

$\displaystyle NA = \{a_1 + \ldots + a_N : a_i \in A\}.$

For instance,

$\displaystyle NA \subset \{0, \ldots ,bN\}.\ \ \ \ \ (1)$

Equality does not hold in general. For instance if ${1 \notin A}$, then ${1 \not NA}$ for any ${N \geq 1}$. We set

$\displaystyle \mathcal E(A) = \mathbb{Z}_{\geq 0} \setminus \cup_{M \in \mathbb{Z}_{\geq 1}} MA ,$

Thus we can refine (1) to

$\displaystyle NA \subset \{0 , \ldots , bN\} \setminus \mathcal E (A)\ \ \ \ \ (2)$

It turns out there is one more obstruction. Note that ${n \in NA}$ if and only if ${bN-n \in bN- NA}$ and so

$\displaystyle bN -n \notin \mathcal E(b-A).$

Thus we may refine (2) to

$\displaystyle NA \subset \{0 , \ldots , bN\} \setminus (\mathcal E (A) \cup (bN - \mathcal E(b-A))).\ \ \ \ \ (3)$

Our main result is the following.

${\ }$ Theorem 1: For ${N \geq b}$, (3) is an equality. ${\spadesuit}$ ${\ }$

Theorem 1 is close to best possible, as can be seen from ${A = \{0,1,b-1,b\}}$. For 3 element sets, we actually show Theorem 1 holds for all ${N \geq 1}$, which contains some of the ideas present in the general case.

We prove Theorem.1 for ${N\geq 2b}$ from first principles. To get from ${2b}$ to ${b}$, here are two key additional inputs: a structural result of Savchev and Chen and Kneser’s theorem from additive combinatorics.

A key definition is ${n_{a,A}}$, which is the smallest element in ${\mathcal P (A) : = \cup_{M \in \mathbb{Z}_{\geq 1}} MA}$ that is equivalent to ${a \ ({\rm Mod}) \ b}$. For instance, it follows from a short argument of Erdös that ${n_{a,A} \in bA}$. Indeed we have

$\displaystyle n_{a,A} = \sum_{j=1}^r a_j.$

If any subsum is ${0 \ ({\rm Mod}) \ b}$, we may remove it to get a smaller element in ${\mathcal{P}(A)}$ that is ${a \ ({\rm Mod}) \ b}$. Thus no subsum is ${0 \ ({\rm Mod}) \ b}$. But this quickly implies that ${r \leq b}$, as desired.

We also study a natural higher dimensional analog of Theorem 1, utilizing some basic tools such as Carathéodory’s Theorem and Mann’s Lemma. In this setting, we provide an analog of Theorem 1, though our bounds are not nearly as good.

# On generating functions in additive number theory, II: Lower-order terms and applications to PDEs

Julia BrandesScott ParsellKonstantinos PouliasBob Vaughan, and I recently upload our paper “On generating functions in additive number theory, II: lower-order terms and applications to PDEs,” to the arXiv. This is somewhat of a sequel to earlier work of Vaughan as well as a recent paper of Burak Erdogan and myself (see also this previous blog post). This work began during the Heilbronn Decoupling and Efficient Congruencing workshop hosted by Kevin Hughes and Trevor Wooley.

In what follows, we adopt Vinogradov’s notation and thus write ${a \ll b}$ when ${a = O(b)}$. Let ${t}$ and ${x}$ be real numbers. We are interested in the exponential sum

$\displaystyle S(t,x) := \sum_{1 \leq n \leq N} e(tn^3 + nx) , \ \ \ e(\theta) : = e^{2 \pi i \theta}.\ \ \ \ \ (1)$

We could replace the exponent 3 by any other integer exponent, but we choose 3 for simplicity, as well as being the case of most interest. Weyl obtained, using what is now known as Weyl differencing, estimates for (1), showing that for almost all ${t}$, and every ${\epsilon >0}$

$\displaystyle \sum_{1 \leq n \leq N} e(tn^2 + nx) \ll N^{1/2+\epsilon} , \ \ \ \sum_{1 \leq n \leq N} e(tn^3 + nx) \ll N^{3/4 + \epsilon}.\ \ \ \ \ (2)$

The quadratic estimate is best possible, as is seen from Parseval. We expect a matching estimate for the cubic exponential sum in (2), but (2) is still the best that is known today. The current project originally started in an attempt to understand better three different proofs of (2). We suppose

$\displaystyle t = a/q + \beta_t , \ \ \ |\beta_t| \leq \frac{1}{q^2}, \ \ \ (a,q) = 1,\ \ \ \ \ (3)$

and then choose ${j}$ so that

$\displaystyle |x - j/q| \leq 1/(2q).\ \ \ \ \ (4)$

By uncertainty principle heuristics, we expect ${S}$, as defined in (1), to be constant on scales

$\displaystyle |t-a/q| \leq \frac{1}{N^3} , \ \ \ |x-j/q| \leq \frac{1}{N}.$

On the other hand ${S(a/q , j/q)}$ is easy to calculate:

$\displaystyle S(a/q , j/q) =\frac{N}{q} \sum_{j=1}^q e(a/q j^3 + nj/q) + O(q).$

In the circle method, one needs precise versions of this heuristic. For instance, it is known (see Theorem 7.2 in Vaughan’s book) that

$\displaystyle S(t,x) = S(a/q , j/q)\frac{ I(\beta_t , \beta_x)}{N} + \Delta,\ \ \ \ \ (5)$

where

$\displaystyle I(\beta_t , \beta_x) : = \int_{1}^N e(\beta_t u^3 + \beta_x u)du.$

and the error term satisfies

$\displaystyle \Delta \ll q(1 + |\beta_x| N + |\beta_t |N^3),\ \ \ \ \ (6)$

Note that the estimate, (6), becomes nontrivial only if ${\beta_x N}$ and ${\beta_t N^3}$ are much less than 1 as predicted by the uncertainty principle. A proof of (6) follows relatively quickly from summation by parts. One should think of the main term as being of size roughly ${N/\sqrt{q}}$, in light of Gauss sum estimates.

Vaughan, Theorem 4.1, asserts the improvement, in the special case ${\beta_x = 0}$

$\displaystyle \Delta \ll q^{1/2 + \epsilon} (1 + N^3 |\beta_t|)^{1/2}.\ \ \ \ \ (7)$

It is worth noting that (7) allows for a proof of Weyl’s inequality, (1). Unfortunately the appearance of the ${N^3}$ (as opposed to a ${N^2}$) seems to prevent one from going beyond Weyl’s inequality (perhaps because one is using a major arc technique to prove a minor arc result). One consequence of our main result below allows for a proof of Weyl inequality in a similar manner with the presence of a nontrivial linear phase.

Brüdern and Robert improved (6) to

$\displaystyle \Delta \ll q^{2/3 + \epsilon} (1 + |\beta_x| N + |\beta_t |N^3)^{1/2}.\ \ \ \ \ (8)$

In the present paper we show that the estimate (7) still holds in the case when ${\beta_x \neq 0}$, provided one takes into account addition main terms, thus improving (7).

${\ }$ Theorem 1: Suppose ${t}$ and ${x}$ are as above, as in (3) and (4). Then

$\displaystyle S(t,x) =\sideset{}{^\dag} \sum_{\substack{d | q \\ ([[j/d]] , q/d) = 1}} S(d[[j /d]] / q , a/q) \frac{I(\beta_t, x - d [[j/d]] )}{N} + \Delta,$

where

$\displaystyle \Delta \ll q^{1/2 + \epsilon} (1 +\beta_t N^3)^{1/2}.$

Here ${[[z]]}$ is the nearest integer to ${z}$ and ${\sideset{}{^\dag}\sum}$ indicates the sum is over distinct values of ${d [[j/d]]}$. ${\spadesuit}$ ${\ }$

The key point is that there are multiple main terms in Theorem 1, given in the sum, which allows an improvement of the error term in (7)

Theorem 1 also indicates that one may be able to perform a major arc only analysis of the system of equations

$\displaystyle \sum_{j=1}^4 x_i^k - y_i^k, \ \ \ k = 1,3,$

though we do not pursue this in the current paper. We are also able to use Theorem 1 to estimate the following.

${\ }$ Theorem 2: Let ${k = 2}$ or ${k = 3}$. For a.e. ${c \in \mathbb{R}}$ and every ${\epsilon >0}$

$\displaystyle \sup_{N \geq 1} \frac{1}{N^{3/4+\epsilon}} \sup_{t} | \sum_{1 \leq n \leq N} e(t n^k + (t+c)n| \ll 1.\ \ \ \ \ (9)$

Furthermore (9) is best possible, in the sense that one cannot replace the exponent ${3/4}$ with anything smaller. ${\spadesuit}$ ${\ }$

One can interpret the condition in (9) as the supremum of ${S(t,x)}$ such that ${(t,x)}$ lies on the diagonal line (on the torus), ${u+v = -c}$. Naively one would expect square root cancellation in (9), though it turns out this is not the case. The point is that square root cancellation is a heuristic that only applies to minor arcs, while the supremum in (9) is obtained on a major arc. Thus it is natural to utilize estimates that can handle both minor and major arcs simultaneously, such as the one in Theorem 1. Combining Theorem 2 with previous joint work with Burak Erdo\u gan, we can provide bound for the fractal dimension of the real and imaginary parts of the solution to Airy’s equation,

$\displaystyle u_t + u_{xxx} = 0 , \ \ \ u(0,x) = g(x).\ \ \ \ \ (10)$

${\ }$ Corollary 1 (Fractal Dimension of Airy): Let ${u(t,x)}$ be the solution to (10) with ${g}$ a non-constant characteristic function of an interval. Then the maximum fractal dimension of the real and imaginary parts of ${u(t,x)}$ restricted to

$\displaystyle \{(t,x) \in \mathbb{T}^2 : t+x = -c\},$

is at most ${17/8}$ for a.e. ${c}$. ${\spadesuit}$ ${\ }$

The introduction of the aforementioned joint work contains a detailed motivation and history to this problem. It was also explored a bit in two undergraduate projects, and one can find some pictures they generated in a 2018 project and another in a 2019 project. We finish by noting the bound of ${17/8}$ in Corollary 1 does not match the known (and expected) lower bound of ${3/4}$.

# Distinct Consecutive Differences

Imre RuzsaJózsef SolymosiEndre Szemerédi, and I recently uploaded On distinct consecutive differences. We say ${A = \{a_1 < \ldots < a_k\} \subset \mathbb{R}}$ is convex if for all ${1 \leq i \leq k-2}$, one has

$\displaystyle a_{i+2} - a_{i+1} > a_{i+1}- a_{i}.$

We also adopt Vinogradov’s asymptotic notation. The fundamental question in the area is the following.

${\ }$ Conjecture 1 (Erdős): Suppose ${A \subset \mathbb{R}}$ is convex. Then for any ${\epsilon >0}$, one has

$\displaystyle |A+A| \gg |A|^{2-\epsilon}. \ \ \ \spadesuit$${\ }$

Progress towards Conjecture 1 was originally made by Hegyvári, with significant improvements by several authors. There is a natural barrier in that for convex ${A}$

$\displaystyle |A+A| \gg |A|^{3/2},\ \ \ \ \ (1)$

that was obtained by several methods. In the current work, we present one that allows of a generalization of indepedent interest.

${\ }$ Theorem 1 (Distinct Consecutive differences): Let ${A \subset \mathbb{R}}$ such that the consecutive differences are distinct. Let ${B \subset \mathbb{R}}$ be arbitrary. Then

$\displaystyle |A+B| \gg |A| |B|^{1/2} .\ \ \ \spadesuit$${\ }$

Note that Theorem 1 immediately implies (1). It turns out that Theorem 1 is best possible, up to the constant. In the extremal construction we present, ${A}$ and ${B}$ have very different structure and so it is natural to ask if one can improve upon Theorem 1 in the case ${B= A}$. The proof is short and purely combinatorial, in a similar spirit to a proof of the Szemerédi-Trotter theorem for cartesian products found in these notes of Solymosi

We also provide a short proof that for convex ${A}$

$\displaystyle |A+A-A| \gg |A|^2,$

which is certainly implied by Conjecture 1. It is interesting to note that the proof is quite inflexible in that for each ${x}$ we find one representation of

$\displaystyle x = a+ b - c, \ \ \ a,b,c \in A.$

For instance, I do not see how to find ${|A|^2}$ elements with at least 100 representations as ${a+b-c}$

We also present a proof of an improvement to (1). It is somewhat annoying that all improvements to (1) lead to quite small quantitive bounds. The interested reader should first see this short paper of Schoen and Shkredov, as well as this previous blog post and and this other previous blog post. I also have some informal notes on the spectral method of Shkredov, which I can distribute upon request.

# Recent Progress on a Problem of Sárközy

I would like to thank Brandon Hanson, Giorgis Petridis, and Kevin Ford for helpful contributions to this post.

A well-known problem of Sárközy is the following.

${\ }$ Conjecture 1: Let ${p}$ be a prime and ${S \subset \mathbb{F}_p^{\times}}$ be the set of squares. Suppose ${A+B = S}$. Then either ${|A| = 1}$ or ${|B| = 1}$. ${\spadesuit}$ ${\ }$

As the squares are a multiplicative subgroup, it is natural to guess they cannot be written additively as a sumet. Conjecture 1 is in a similar spirit of a long list of conjectures concerning the independence of multiplication and addition, such as the twin prime conjecture, the abc-conjecture, and the sum-product conjecture.

Progress towards Conjecture 1, as of last week, was summarized in this mathoverflow post and this other mathoverflow post (and the papers referenced within). Sárközy proved that if ${A+B = S}$, then

$\displaystyle \frac{\sqrt{p}}{3 \log p} \leq |A| , |B| \leq \sqrt{p} \log p,\ \ \ \ \ (1)$

which we recall below. In fact, Shparlinski (in Theorem 7) improved (1) and then later Shkredov showed (in Corollary 2.6)

$\displaystyle (1/6 - o(1)) \sqrt{p} \leq |A| , |B| \leq (3 + o(1)) \sqrt{p}.$

Brandon Hanson and Giorgis Petridis, utilizing the polynomial method, recently made significant progress towards Conjecture 1.

${\ }$ Theorem 1 (Hanson-Petridis): Suppose ${A+B = S}$. Then

$\displaystyle |A| |B| = |S|.$

In particular every element of ${S}$ has a unique representation of the form

$\displaystyle a + b , \ \ \ a \in A \ \ , b \in B. \ \ \ \spadesuit$

${\ }$

Their techniques handle the case where ${S}$ is replaced by any nontrivial multiplicative subgroup, but we focus on the squares for simplicity. In particular, if ${(p-1)/2}$ is a prime, then Conjecture 1 is established. This implies Conjecture 1 is true for infinitely many primes, provided there are infinitely many Sophie Germain primes (yet another conjecture based on the independence of multiplication and addition). Making use of (1) we are able to prove this unconditionally. Here ${\pi(x)}$ is the prime counting function.

${\ }$ Corollary 1: For all but ${o(\pi(x))}$ primes less than ${x}$, Conjecture 1 holds. ${\spadesuit}$ ${\ }$

Proof: Let ${p}$ be a prime. Suppose ${A+B = S}$ with ${|A| , |B| > 1}$. Then by Theorem 1 and (1), ${(p-1)/2}$ has a divisor between ${\frac{\sqrt{p}}{3 \log p} }$ and ${\sqrt{p} \log p}$. By Theorem 6 (followed by Theorem 1 (v)) in Kevin Ford’s work on the distribution of integers with divisors in an given interval, there are at most ${o(x / \log x)}$ such primes. Corollary 1 then follows from the prime number theorem. ${\spadesuit}$ ${\ }$

If ${S = A+B}$, we always have trivial bound

$\displaystyle (p-1)/2 = |A+B| \leq |A||B|, \ \ \ \ \ (2)$

which can be compared to the following bilinear estimate.

${\ }$ Theorem 2: Let ${\chi}$ be the quadratic character modulo ${p}$. Then

$\displaystyle |\sum_{a \in A , b \in B} \chi(a +b ) | \leq \sqrt{p |A| |B|}.$

In particular if ${S = A+B}$, then

$\displaystyle |A| |B| \leq p. \ \ \ \spadesuit$

${\ }$

The proof of Theorem 2, which we give, is standard Fourier manipulations (see chapter 4 of Tao and Vu for more details). As we will see below, Hanson and Petridis make no use of this perspective.

${\ }$ Proof: The second statement follows from the first as if ${S= A + B}$, then

$\displaystyle \sum_{a \in A , b \in B} \chi(a+b) = |A| |B|.$

By Fourier inversion, it is enough to show

$\displaystyle \sum_{a \in A , b \in B , \xi \in \mathbb{F}_p} e_p(\xi (a +b)) \widehat{\chi}(\xi) \leq \sqrt{p |A| |B|}, \ \ \ e_p(\theta) : = e^{2 \pi i \theta / p} ,\ \ \ \ \ (3)$

where

$\displaystyle \widehat{f}(\xi) = \frac{1}{p} \sum_{x \in \mathbb{F}_p} f(x) e_p(-\xi x).$

Note ${ \widehat{\chi}(0) = 0}$, and the usual estimate for Gauss sums implies

$\displaystyle |\widehat{\chi}(\xi)| \leq p^{-1/2} , \ \ \ \xi \neq 0.$

Combining with (3), it is enough to show

$\displaystyle \sum_{\xi \in \mathbb{F}_p^{\times}} |\widehat{1_A}(\xi)||\widehat{1_B}(\xi)| \leq p^{-1} \sqrt{|A| |B|} .$

But this follows from Cauchy-Schwarz and Parseval. ${\spadesuit}$ ${\ }$

Combining (2) and Theorem 2, we see that if ${S = A+B}$, then

$\displaystyle (p-1)/2 \leq |A||B| \leq p.\ \ \ \ \ (4)$

Thus Theorem 1 asserts that the lower bound in (4) is the only possibility. We now proceed towards a proof of Theorem 1. The starting point is a classical theorem of Fermat.

${\ }$ Theorem 3 (Fermat): Let ${a \in \mathbb{F}_p}$. Then ${a \neq 0}$ if and only if ${a}$ is a root of ${x^{p-1} - 1}$. ${\spadesuit}$ ${\ }$

There are many proofs of this elementary fact, for instance it is a quick consequence of Lagrange’s theorem. As a consequence, we have the following.

${\ }$ Corollary 2: Let ${a \in \mathbb{F}_p^{\times}}$. Then ${a}$ is a square if and only if ${a}$ is a root to

$\displaystyle x^{(p-1) / 2} - 1. \ \ \ \spadesuit \ \ \ \ \ (5)$

${\ }$

Proof: Every square is a root of (5) as if ${a = x^2}$ for some nonzero ${x}$ then by Theorem 3,

$\displaystyle a^{(p-1)/2} = x^{p-1} = 1.$

Thus the squares are a subset of the roots of (5). On the other hand there are ${(p-1)/2}$ squares and at most ${(p-1)/2}$ roots and so the set of squares is precisely the set of roots of (5). ${\ }$

In is worth noting that there is a significant gap in the degree of the polynomial in (5) as opposed to in Theorem 3, which we make use of later. We now give a polynomial characterization of the cardinality of a set.

${\ }$ Lemma 1: Let ${0 \leq n \leq p}$ and ${A \subset \mathbb{F}_p}$. Then ${|A| \geq n}$ if and only if for any ${d_0 , \ldots , d_{n-1} \in \mathbb{F}_p}$, the equations

$\displaystyle \sum_{a\in A} c_a a^k = d_k , \ \ \ 0 \leq k \leq n-1,$

have a solution in the ${c_a}$. ${\spadesuit}$ ${\ }$

Proof: This is a classical fact about Vandermonde matrices. ${\spadesuit}$ ${\ }$

We now proceed to the proof of Theorem 1, which adopts the Stepanov method of auxiliary polynomials.

${\ }$ Proof of Theorem 1: Let ${A , B \subset \mathbb{F}_p}$ and suppose ${A+B = S}$. We choose ${|A| < (p-1)/2}$, which is possible in light of (4). By Lemma 1, we may choose ${c_a\in \mathbb{F}_p}$ so that ${g \equiv 0}$, where

$\displaystyle g(x) = - 1 + \sum_{a \in A} c_a (x+a)^{|A| -1} .\ \ \ \ \ (6)$

Let

$\displaystyle F(x) = -1 + \sum_{a\in A} c_a (x+a)^D, \ \ \ D = (p-1) / 2 + |A| -1.\ \ \ \ \ (7)$

Our choice of ${c_a}$ will ensure that each ${b \in B}$ is a root of ${F}$ with high multiplicity and that ${F \neq 0}$. By Corollary 2, since ${a+b \in S}$,

$\displaystyle F(b) = -1 + \sum_{a\in A} c_a (b+a)^D =- 1 + \sum_{a\in A} c_a (b+a)^{|A| - 1} = g(b).$

Thus each ${b \in B}$ is a root of (7). Furthermore,

$\displaystyle F'(x) = \sum_{a \in A} c_a D(x+a)^{D-1},$

and so, again by Corollary 2,

$\displaystyle F'(b) = Dg'(b) = 0.$

We may apply this argument for higher derivatives to obtain

$\displaystyle F^{(j)} (b) = 0 , \ \ \ 0 \leq j \leq |A| - 1.$

Thus each ${b \in B}$ is a root of ${F}$ with multiplicity ${|A|}$ and it follows that, if ${F \neq 0}$,

$\displaystyle |A| |B| \leq {\rm deg}(F).\ \ \ \ \ (8)$

It turns out that our previous choice for ${c_a}$ ensure that ${{\rm deg}(F) = (p-1)/2}$ (and is thus nonzero). For instance, since ${g \equiv 0}$, considering the leading term in (6), we have

$\displaystyle \sum_{a \in A} c_a = 0.$

and so the leading term of ${F}$, which is the same, is also zero. The same argument works to show that the coefficient of ${x^j}$ of ${F}$ is zero for ${(p-1)/ 2 + 1 \leq j \leq D}$. Now the constant term of ${g}$ in (6) is

$\displaystyle -1 + \sum_{a \in A} c_a a^{|A| -1} = 0,$

where is the coefficient of the ${x^{(p-1)/2}}$ term in ${F}$ is

$\displaystyle {D \choose (p-1)/2} \sum_{a \in A} c_a a^{|A| - 1} = {D \choose (p-1) / 2} \neq 0.$

Combining this with (8), we find

$\displaystyle |A||B| \leq (p-1)/2.$

The second assertion in the Theorem follows immediately (see also Proposition 2.3 in Tao and Vu). ${\spadesuit}$ ${\ }$

Remark 1: Suppose ${A+B = S}$ with ${|A||B| = (p-1)/2}$. The proof of Theorem 1 reveals that

$\displaystyle F(x) = \alpha \prod_{b \in B} (x- b)^{|A|}, \ \ \ \alpha = {D \choose |A| - 1},$

with ${F}$ and ${D}$ defined (7). Furthermore, we have the identity

$\displaystyle \prod_{a \in A , b \in B} (x - (a+b)) = x^{(p-1)/2} - 1. \ \ \ \spadesuit$

${\ }$

A close variant of the extremal case left by Theorem 1 was studied by Lev and Sonn. Following Sárközy, we now prove (1) (with little concern for the precise constants). The key input is the Weil bounds, and so the square root barrier that appears in (1) is natural (as discussed in this previous blog post).

${\ }$ Proof of (1): Let ${p}$ be a prime and suppose ${A+B = S}$. Without loss of generality, we may suppose ${|B| \geq |A|}$. By (2), it is enough to show

$\displaystyle |B| \lesssim \sqrt{p} \log p.\ \ \ \ \ (9)$

Let ${\chi}$ be the Legendre symbol and consider, for ${A' \subset A}$,

$\displaystyle h(x) = 2^{-|A'|} \prod_{a \in A'} (\chi(x + a) + 1) .\ \ \ \ \ (10)$

Thus ${h \geq 0}$ and by our assumption ${h(b) = 1}$ for any ${b \in B}$ and so

$\displaystyle |B| \leq \sum_{x \in \mathbb{F}_p} h(x) \ \ \ \ \ (11)$

On the other hand, expanding the product in (10) and applying the Weil bounds of the form

$\displaystyle \sum_{x \in \mathbb{F}_p} \chi(x+ a_1) \cdots \chi(x+ a_k) \leq (k-1) \sqrt{p},$

for distinct ${a_1 , \ldots , a_k}$, we find

$\displaystyle |B| \leq \sum_{x \in \mathbb{F}_p} h(x) \leq p 2^{-|A'|} + (|A'|- 1) \sqrt{p} , \ \ \ A' \subset A.\ \ \ \ \ (12)$

This establishes (9) if ${|A| \geq \log p / \log 4}$, since then we may choose ${\log p / \log 4 \leq |A'| \leq \log p / \log 4 + 1}$. Otherwise, we apply (12) with ${A' = A}$ and multiply both sides by ${|A|}$, and using (2), we find

$\displaystyle (p-1) / 2 \leq p 2^{-|A|} + (|A| - 1) \sqrt{p}.$

This implies, crudely, that for ${|A| \geq 2}$,

$\displaystyle p/4 \lesssim \sqrt{p} \log p,$

which is absurd for ${p}$ large (and letting the implied constants handle the case ${p}$ small). ${\spadesuit}$