When statistical things go wrong, it’s often because someone unknowingly assumed ergodicity where that wasn’t ok. This can have dramatic effects in everyday language: I will use the example of incarceration rates. I will then present a visual illustration to discuss the role of time scales.

### David and Luigi in jail

I’ll tell you a secret: when I read a statistical statement I often wonder whether it’s a temporal statement or an ensemble statement. Do you do that too? Take this headline for example: “Young black people nine times more likely to be jailed than young white people” from The Guardian. I won’t talk about why this might be, that’s not the point of this post. My point, as usual, is about time versus ensembles. If you read the Guardian article, you’ll find that the headline is an ensemble statement. It’s supposed to convey that the proportion of the ensemble of black people under the age of 18 in the UK who were in jail (in a broad sense) when statistics were collected was nine times higher (0.09%) than the corresponding figure for white people (0.01%).

It would be a temporal statement if it meant that your friend David, who is black, spent around 142 hours in jail before he turned 18, whereas your friend Luigi, who is white, spent only around 16 hours in jail before he turned 18. It obviously doesn’t mean that, but — perhaps less obviously — it also doesn’t mean that when David was born he was in any meaningful way 9 times more likely than Luigi to be in jail before they turned 18 — what happens along a single life path over time is not reflected by these aggregate figures.

The word “likely” does not specify whether the probabilities it reflects are relative frequencies in an ensemble or in time. Such unspecific language is problematic when only one interpretation is correct. So: people’s experiences of the penal system are best not talked about in probabilistic terms. Let’s generalize this recommendation: we shouldn’t talk about anything in probabilistic terms unless we’re convinced that the time and ensemble interpretations of what we’re saying are equivalent. Nassim Taleb, in his latest book, put it laconically as “no probability without ergodicity.”

It’s not just this one example — lots of statistical statements are phrased in probabilistic language, with the implicit (and often false) assumption that ensemble-interpretations and temporal interpretations of that language will be equivalent. That assumption is called the “ergodic hypothesis.” In the guardian example, just reading the headline and then wrongly assuming ergodicity can quite easily lead to horrendous misinterpretations, so let’s watch our language, seriously.

### Ergodicity and time scales

“What do you want with that baseball bat? I told you I’ll get you your money as time goes to infinity!”

… will not keep the mob off your back for long, even if you’re telling the truth.

The ergodic hypothesis is designed for so-called “fast” systems, meaning for systems where each trajectory (each person) explores all of its possible states (jail or no jail) over time scales that are short compared to the time scale of measurement. In our example, this would be the case if David and Luigi were each thrown in jail twice a month for a few minutes. Since we only care about where they spent the first 18 years of their lives, saying Luigi spent 0.01% of his time in jail would be good enough (if that were true). Of course that’s not true in this example — relax, your friends David and Luigi don’t even know what a jail looks like.

In reality, instead of David and Luigi rotating in and out of jail all the time, there are a small number of people who spend far more than their fair share of time behind bars (the word “fair,” as often in a probabilistic context, has various meanings here).

While pondering the fate of David and Luigi, it occurred to me that I should produce an example of an ergodic system — one where it’s ok to switch time and ensemble perspectives, just so we all know what that means. Almost nothing interesting is well modeled as ergodic, so the example will be boring. Here it goes: your brain makes visual measurements on a time scale of about 20 milliseconds — if I switch between two images more slowly than this, you will notice the change. If I switch much faster, your brain starts averaging over time, and you will perceive something constant in time that contains both images. Aside: I don’t claim to know anything about brains, I’m just guessing this time scale because computer screens used to refresh their images at roughly 50Hz (every 20 milliseconds) and seemed to flicker, while faster screens are nicer.

In Fig. 1 I’ve created four gifs, switching between red and blue at increasing frequency. In the first two we can clearly perceive the red and blue states as distinct — the characteristic time scale of the dynamic is slower than that of the measurement (our vision). The third gif flickers a little, but — at least to my slow brain — it seems kind of purple. That’s because the characteristic time scale of the dynamic is now similar to, or has surpassed, that of the measurement. The final gif is just purple — this is just the static color composed of red and blue with equal weight (RGB code 880088), and I’ve marked it 0 seconds because it’s like switching infinitely fast.

Fig.1: switching color between red and blue at different time scales.

For the first two images saying “this is a purple square” leaves out information that’s relevant on the time scale of measurement. If we call the third square “purple” we’re also replacing a dynamic description “it switches every 20 milliseconds between red and blue” with an average (ensemble or time) description. But because our brains are so slow, to us the square is meaningfully both blue and red “simultaneously” and the description “purple” is beginning to capture what we need to know.

Long story short: probabilistic descriptions are dangerous territory. They may be ok for a system where

• any single trajectory through time explores everything that might happen and
• it does that so fast that, on the time scale we’re interested in, it’s as if everything is happening simultaneously.

For David and Luigi that’s obviously not the case, for the red and blue squares it can be.

## 17 thoughts on “Ergodicity, jail, and time scales”

1. Satya Prakash Akula says:

Thank you. Great analysis. Keep going

Liked by 1 person

2. Hamlet says:

Baking a cake takes time: 400 degrees in the oven for 40 minutes. I’ll save time by baking at 16,000 degrees for only one minute!

Liked by 2 people

1. Valeri says:

The outcome dependence on temperature is nonlinear, so your cake will explode in a couple of seconds.

Like

3. David Barnes says:

You have it right. Human visual system latency allows discrete images to be integrated at video frame rates. That is why LCD makers had to increase frame rates to match the perceived performance of CRT, Plasma or (now) OLED. Phosphor-based technologies produce fast exponential decays while LCD produce square-wave-like transitions. The faster, smoother decay of phosphors really helps… Anyway, thanks for working on this important subject; decisions, not displays!

Liked by 1 person

1. M Anand says:

For people to appreciate the above point.

Pl see this webpage on a new generation mobile like One Plus 6T and above. You will notice that even the third picture is not purple, it still shows distinct blue and red colours.

Liked by 1 person

4. João says:

I think it’s implicit that the probability in the guardian article refers to ensemble probability. Why can’t it be used as an ensemble probability?

Like

1. I agree with you – the Guardian statement is one about the ensemble. The error would be to interpret it temporally.

Like

5. Luigi says:

Thank you for your eye-opening theory, it’s such a transformative tool! I know its point is, to advise against confusing ensemble- and time-average. But I see that it could also explain what happens when we use a model. (I’m just a biology student, maybe I’m completely wrong)

Applying your model (or language), the problem of using any model is, that we are naturally ignorant about an over-ensemble A* from which our reality (which is the base of human logic) is a path itself, including all its ensembles you can observe in it.

For example: All Investors on earth are an ensemble A, but there is a over-ensemble A* with alternative realities “not R” (including alternative elements from which you can generate ensembles), which makes >>our particular reality<< R with the ensemble A, an element (or path in) of A*.

This could be seen as a highly synthetic artificial ensemble, but this is exactly the situation when we use a model.

An obvious case: Right now, a certain stock-to-flow model has gained popularity and people think it predicts the future price of scarce assets like gold. What it does, is ex-ante "post-dicting". The back-test shows, that the model would have been predictive, if the model was used. With the essential condition, that if and only IF using it, would not have changed the course of history, in order for the model to be predictive in the first place. …

so implicitly we synthesize an artificial over-ensemble without knowing it. Now your model indicates a pathological state like a fallacy and it connects ergodic-thinking to other related fallacies: (like a tautologic problem which is the observer selection bias or the hated "anthropic principle" implying a survivorship bias… because when we chose a model (any model) we decide against all the alternatives in A*) … is there a rule or a cap in ergodic-theory which defines our reality as the top-reality? so that fractal Matryoshka dolls like the one above are not allowed?

Like

1. You mean Bitcoin, don’t you? You also have to account for the self-fulfilling part of the strategies of market participants. The model becomes “a meme” and, thanks to powerful coordination mechanisms like CT (CryptoTwitter), exhibits manifest destiny.

Like

6. Neil says:

Thanks for this Ole. You have explained very well the reason why the Guardian’s wording is not as clear as it should be.

“people’s experiences of the penal system are best not talked about in probabilistic terms”.

What would be a better way for the Guardian to report these figures?

Liked by 1 person

1. Good question. It’s been a while since I read the article. As far as I remember, it was possible to extract from the article what was actually meant, which is good.

But the headline was dangerously sloppy. The points are too nuanced to capture in a headline, so perhaps that should be boring and neutral “UK incaceration rates published” or something like that.

I’m clearly not in the newspaper business…

Like

7. Lance says:

Hi Ole, so to help me understand – is ensemble similar to taking a snapshot in time of a data set vs a timeseries of a data set? And ergodicity is when the snapshot and timeseries datasets are the same? Thanks.

Like

1. That’s approximately correct. Technically, there are some infinities and transients involved, but I think your characterization captures the spirit of it.

Like

8. Perhaps this is a language issue, but any native English reader will look at the headline and understand that it mean: either 1) A randomly selected Black teen is 9x more likely to have been in jail (at least once) than a randomly selected White teen; or 2) “About nine in every 10,000 young black people in the general population were locked up in young offender institutions, secure training centres or secure children’s homes in England and Wales in 2015-16. This compared with one in every 10,000 of those from white ethnic backgrounds”.

No one would think it refers to duration.
Am I missing something here? This post seems to introduce a confusion that doesn’t exist, and then to somewhat address that confusion.

Like