Applying Ergodic Theory to Number Theory

According to Lindenstrauss’s beautiful expository article [Lin], a common application of Ergodic Theory for solving a number theoretical problem involves three steps:

Step 1. Translate your number theoretical problem into a problem about dynamical systems.

Step 2. Classify the invariant measures for these dynamical systems.

Step 3. Use this measure classification to deduce a uniform distribution result, which hopefully implies a solution to your number theoretical problem.

If your number theoretical problem already asks for some uniform distribution result, then it is very natural (although not necessarily easy!) to follow this approach (see for instance the proof in [Fur] of the uniform distribution of fractional parts of n²x when x is irrational). The goal of this post is to work out a number theoretical problem which has no apparent relation with uniform distribution, giving an “ergodic proof” along the steps outlined above. Additionally, we will try to make most of the discussion accessible to advanced high school students.

The problem.

We denote by Z/nZ the set of residue classes modulo n. This set has addition and multiplication operations inherited from the corresponding operations in the set of integers Z. For instance, when n=5 the elements of Z/5Z can be represented as 0,1,2,3,4. Inside Z/5Z we have the equality 3+3=1 (or, if the reader prefers to talk about congruences in Z, we have 3+3 congruent to 1 modulo 5).

This is the problem that we will consider.

Proposition 1. Let p>2 be an odd prime. Let b,c be elements of Z/pZ and consider the function f: Z/pZ → Z/pZ given by f(x)= x² + bx +c. Suppose that all the values f(0), f(1), …, f(p-1) are squares in Z/pZ. Then actually the function f(x) is a square, that is, there is v in Z/pZ such that f(x)=(x+v)².

In other words, the only reason for f(x) to take square values always is just the obvious reason, namely, that f(x) itself is a square. The proof that we give below is a very extended version of the proof of Proposition 12.1 in [PPV]. The reader can surely find alternative proofs or refinements; the point here is to give a toy example where the ergodic approach can solve a number theoretical problem which, a priori, has no obvious relation with dynamics.

Before going into the proof let us briefly recall some elementary facts about Z/pZ. If Z/pZ is an old friend of the reader, then she/he can safely skip the next paragraph.

Some facts about Z/pZ.

Let p be a prime. The first interesting fact about Z/pZ is that this set is actually a field, which means that every non-zero element has a multiplicative inverse. For instance, when p=5 the elements of Z/5Z are 0,1,2,3,4. The inverses of 1,2,3,4 are 1,3,2,4 respectively (for example, 2·3=6 which is 1 modulo 5).

In Z/2Z={0,1} every element is a square. When p>2 is an odd prime the situation is different, and exactly (p+1)/2 of the elements are squares (for instance, the squares in Z/5Z are 0,1,4). Here is an indication for the proof: since 0 is a square in Z/pZ we only have to show that there are exactly (p-1)/2 squares among the non-zero elements. Note that x and -x have the same square and deduce that there are at most (p+1)/2 squares in Z/pZ (which is enough for our purposes!). To get the equality, recall that a quadratic equation can have at most 2 solutions over a field.

Now we go into the proof of Proposition 1.

Step 1: Translation into dynamics.

Let p>2 be an odd prime and consider f(x)=x²+bx+c with b,c in Z/pZ. Completing the square, we can write f(x)=(x+v)²+a with v,a in Z/pZ. Under the hypothesis of Proposition 1, we would like to show that a=0, so that f(x)=(x+v)². For doing this, we will attach a “dynamical system” to each possible value of a in Z/pZ.

Given a in Z/pZ define the map  Ta (x)= x+a from Z/pZ to Z/pZ. We think about this function as acting on Z/pZ and moving its elements after we apply it once, twice, and again and again. The function is just a shift on the elements of Z/pZ, and this is precisely the dynamical system that we want to study.

When a=0, the dynamics of Ta is pretty boring: nothing moves, no matter how many times we apply Ta. When a is a non-zero element in Z/pZ there is a completely different dynamical behavior and we want to take advantage of that.

Step 2: Classification of invariant measures.

A measure in a set X is a way of assigning a (non-negative) mass to the subsets of X. For instance, in the interval [0,1] we can use the notion of length to measure (suitable) subsets of X=[0,1]. A probability measure is a measure with total mass 1 (like the natural length in [0,1]).

If T is a map from X to X, we say that a probability measure is invariant under T if the mass of S is the same as the mass of the pre-image of S under T, for all subsets S of (omitting some technical restrictions on S and T). When T is bijective we can simply require that the measure of S and T(S) is the same; this is the case for our maps Ta.

It is easy to describe what are all the probability measures in Z/pZ: we only need to assign non-negative weights to the p elements of Z/pZ and make sure that all the weights add up to 1. Then, if you what to measure a subset S of Z/pZ you only need to add the weights of the elements that belong to S.

The next result classifies all the invariant probability measures for our functions Ta acting on Z/pZ.

Proposition 2. If a=0, then any probability measure is invariant under Ta. If a is non-zero, then there is a unique invariant probability measure for Ta namely, the measure that assigns weight 1/p to each of the elements of Z/pZ (we call this the average measure).

Proof. The first part is obvious, because Ta(S)=S for all subsets of Z/pZ when a=0. For the second part, first note that the measure that gives mass 1/p to each element is indeed invariant under any shift, hence, under Ta. Conversely, let the positive integer n be the inverse of a modulo p. Then applying n times the function Ta to an element x gives
Hence, all invariant measures must satisfy that the mass of any element x is the same as the mass of x+1. Therefore, all the elements of Z/pZ must have the same mass, and since the total sum must be 1 we see that each element has mass 1/p.

Step 3: Uniform distribution and conclusion.

A dynamical system with a unique invariant measure (and other technicalities) is called uniquely ergodic. As a general fact, when a dynamics is uniquely ergodic then most (if not all) points have orbits uniformly distributed under the unique invariant measure. That is, the frequency with which a given point visits a subset S under the dynamics is proportional to the mass of S (using the unique invariant measure for this dynamical system).

What Proposition 2 shows is that Ta gives a uniquely ergodic dynamic on Z/pZ exactly when a is a non-zero element of Z/pZ, and the unique invariant measure is the average measure. From this one can deduce uniform distribution of orbits under the average measure, but for our purposes the following corollary is enough.

Proposition 3. If a is a non-zero element of Z/pZ then for every element x of Z/pZ, the sequence x, Ta(x), Ta(Ta(x)), … contains all the elements of Z/pZ.

Proof. Let x in Z/pZ be given. Consider the sequence of probability measures m1 , m2 , m3 , …, mk , … where mk is the measure that assigns mass to points t in Z/pZ as follows:

mk(t)=N/k where N is the number of times that t appears among the first k iterates of x under Ta.

For given t, we claim that the sequence mk(t) converges to some limit as k grows to infinity. If x visits at most once the point t then this is clear and the limit is 0. Otherwise, if x visits at least twice the point t then we let r be the number of iterates that takes from the first to the second visit. Since the iterates of x will follow the same path again, this periodicity shows that the limit of mk(t) as k grows exists and it is 1/r. Therefore, the sequence of measures mk converges to some limit measure. A shifting argument among the repeated compositions of Ta shows that the limit of the probability measures mk will be invariant under Ta. But Ta has a unique invariant probability measure, namely, the average measure! therefore, the limit of the measures mk is the average measure. Since the average measure is strictly positive at each point, we see that eventually some mk will be positive at each point. Looking at the definition of mk we see that the iterates of x under Ta will visit each point!

Remark: one can prove Proposition 3 directly without talking about invariant measures (the interested reader can do this as an easy exercise). However, the previous proof gives some idea of the general kind of arguments for deducing uniform distribution from classification of measures.

Finally, we can use this valuable information to prove Proposition 1.

Proof of Proposition 1. Recall that we just need to show that, if f(x)= (x+v)²+a has the property that f(0), f(1), …, f(p-1) are squares in Z/pZ, then necessarily a=0 in Z/pZ.
Suppose on the contrary that f(0), f(1), …, f(p-1) are squares in Z/pZ but a is non-zero. Let S be the set of squares in Z/pZ and note that the elements of S are (allowing repetitions):
(0+v)², (1+v)², …, (p-1+v)²
no matter the value of v in Z/pZ. Since f(0), f(1), …, f(p-1) are squares in Z/pZ, and since f(x)=(x+v)²+a, we see that for each t in S, the element t+a also belongs to S. Hence the function Ta(x)=x+a maps S to itself. Pick any element in S, say 1 (which is a square!). Then Ta(1) also is in S, hence Ta(Ta(1)) also is in S, and so forth. We conclude that the whole set of iterates of 1 under Ta is included in S. If a is non-zero Proposition 3 proves that the set of iterates of 1 under Ta is the whole Z/pZ, hence S=Z/pZ and every element of Z/pZ is a square. This is not possible because for p odd there are only (p+1)/2 squares in Z/pZ. This contradiction shows that a=0, and Proposition 1 is proved.

Well, that was a lot of work for proving Proposition 1 but I hope you enjoyed it. I am not so good saying good bye, so here is an exercise instead:

Do we really need to check that f(0), f(1), …,f(p-1) are squares in Z/pZ for concluding that f(x) itself is a square? Use the same type of arguments to get an improved bound. Use any kind of arguments to get the optimal bound.


[Fur] Furstenberg, Recurrence in ergodic theory and combinatorial number theory. Princeton University Press (1981).

[Lin] Lindenstrauss, Some examples how to use measure classification in Number Theory. Equidistribution in number theory, an introduction. Springer (2007)

[PPV] Pasten, Pheidas, Vidaux, A survey on Buchi’s Problem. Zap. Nauchn. Sem. S.-Peterburg. Otdel. Mat. Inst. Steklov. (POMI) 377 (2010)

About hpasten

I am mainly interested in Number Theory.
This entry was posted in Uncategorized. Bookmark the permalink.