Formal economics without parallel universes

Watch

Network

Ergodicity, jail, and time scales

by

When statistical things go wrong, it’s often because someone unknowingly assumed ergodicity where that wasn’t ok. This can have dramatic effects in everyday language: I will use the example of incarceration rates. I will then present a visual illustration to discuss the role of time scales.

David and Luigi in jail

I’ll tell you a secret: when I read a statistical statement I often wonder whether it’s a temporal statement or an ensemble statement. Do you do that too? Take this headline for example: “Young black people nine times more likely to be jailed than young white people” from The Guardian. I won’t talk about why this might be, that’s not the point of this post. My point, as usual, is about time versus ensembles. If you read the Guardian article, you’ll find that the headline is an ensemble statement. It’s supposed to convey that the proportion of the ensemble of black people under the age of 18 in the UK who were in jail (in a broad sense) when statistics were collected was nine times higher (0.09%) than the corresponding figure for white people (0.01%).

It would be a temporal statement if it meant that your friend David, who is black, spent around 142 hours in jail before he turned 18, whereas your friend Luigi, who is white, spent only around 16 hours in jail before he turned 18. It obviously doesn’t mean that, but — perhaps less obviously — it also doesn’t mean that when David was born he was in any meaningful way 9 times more likely than Luigi to be in jail before they turned 18 — what happens along a single life path over time is not reflected by these aggregate figures.

The word “likely” does not specify whether the probabilities it reflects are relative frequencies in an ensemble or in time. Such unspecific language is problematic when only one interpretation is correct. So: people’s experiences of the penal system are best not talked about in probabilistic terms. Let’s generalize this recommendation: we shouldn’t talk about anything in probabilistic terms unless we’re convinced that the time and ensemble interpretations of what we’re saying are equivalent. Nassim Taleb, in his latest book, put it laconically as “no probability without ergodicity.”

It’s not just this one example — lots of statistical statements are phrased in probabilistic language, with the implicit (and often false) assumption that ensemble-interpretations and temporal interpretations of that language will be equivalent. That assumption is called the “ergodic hypothesis.” In the guardian example, just reading the headline and then wrongly assuming ergodicity can quite easily lead to horrendous misinterpretations, so let’s watch our language, seriously.

Ergodicity and time scales

“What do you want with that baseball bat? I told you I’ll get you your money as time goes to infinity!”

… will not keep the mob off your back for long, even if you’re telling the truth.

The ergodic hypothesis is designed for so-called “fast” systems, meaning for systems where each trajectory (each person) explores all of its possible states (jail or no jail) over time scales that are short compared to the time scale of measurement. In our example, this would be the case if David and Luigi were each thrown in jail twice a month for a few minutes. Since we only care about where they spent the first 18 years of their lives, saying Luigi spent 0.01% of his time in jail would be good enough (if that were true). Of course that’s not true in this example — relax, your friends David and Luigi don’t even know what a jail looks like.

In reality, instead of David and Luigi rotating in and out of jail all the time, there are a small number of people who spend far more than their fair share of time behind bars (the word “fair,” as often in a probabilistic context, has various meanings here).

While pondering the fate of David and Luigi, it occurred to me that I should produce an example of an ergodic system — one where it’s ok to switch time and ensemble perspectives, just so we all know what that means. Almost nothing interesting is well modeled as ergodic, so the example will be boring. Here it goes: your brain makes visual measurements on a time scale of about 20 milliseconds — if I switch between two images more slowly than this, you will notice the change. If I switch much faster, your brain starts averaging over time, and you will perceive something constant in time that contains both images. Aside: I don’t claim to know anything about brains, I’m just guessing this time scale because computer screens used to refresh their images at roughly 50Hz (every 20 milliseconds) and seemed to flicker, while faster screens are nicer.

In Fig. 1 I’ve created four gifs, switching between red and blue at increasing frequency. In the first two we can clearly perceive the red and blue states as distinct — the characteristic time scale of the dynamic is slower than that of the measurement (our vision). The third gif flickers a little, but — at least to my slow brain — it seems kind of purple. That’s because the characteristic time scale of the dynamic is now similar to, or has surpassed, that of the measurement. The final gif is just purple — this is just the static color composed of red and blue with equal weight (RGB code 880088), and I’ve marked it 0 seconds because it’s like switching infinitely fast.

Fig.1: switching color between red and blue at different time scales.

For the first two images saying “this is a purple square” leaves out information that’s relevant on the time scale of measurement. If we call the third square “purple” we’re also replacing a dynamic description “it switches every 20 milliseconds between red and blue” with an average (ensemble or time) description. But because our brains are so slow, to us the square is meaningfully both blue and red “simultaneously” and the description “purple” is beginning to capture what we need to know.

Long story short: probabilistic descriptions are dangerous territory. They may be ok for a system where

  • any single trajectory through time explores everything that might happen and
  • it does that so fast that, on the time scale we’re interested in, it’s as if everything is happening simultaneously.

For David and Luigi that’s obviously not the case, for the red and blue squares it can be.

Author

35 responses to “Ergodicity, jail, and time scales”

  1. Satya Prakash Akula avatar
    Satya Prakash Akula

    Thank you. Great analysis. Keep going

  2. Satya Prakash Akula avatar
    Satya Prakash Akula

    Thank you. Great analysis. Keep going

  3. Hamlet avatar
    Hamlet

    Baking a cake takes time: 400 degrees in the oven for 40 minutes. I’ll save time by baking at 16,000 degrees for only one minute!

    1. Valeri avatar
      Valeri

      The outcome dependence on temperature is nonlinear, so your cake will explode in a couple of seconds.

  4. Hamlet avatar
    Hamlet

    Baking a cake takes time: 400 degrees in the oven for 40 minutes. I’ll save time by baking at 16,000 degrees for only one minute!

    1. Valeri avatar
      Valeri

      The outcome dependence on temperature is nonlinear, so your cake will explode in a couple of seconds.

  5. Thierry avatar
    Thierry

    Thanks for the analysis. Insightful.

  6. Thierry avatar
    Thierry

    Thanks for the analysis. Insightful.

  7. Q avatar
    Q

    Great example. Far from being boring, it is in fact the principle behind how digital micromirror devices (https://en.wikipedia.org/wiki/Digital_micromirror_device) work.

  8. Q avatar
    Q

    Great example. Far from being boring, it is in fact the principle behind how digital micromirror devices (https://en.wikipedia.org/wiki/Digital_micromirror_device) work.

  9. David Barnes avatar
    David Barnes

    You have it right. Human visual system latency allows discrete images to be integrated at video frame rates. That is why LCD makers had to increase frame rates to match the perceived performance of CRT, Plasma or (now) OLED. Phosphor-based technologies produce fast exponential decays while LCD produce square-wave-like transitions. The faster, smoother decay of phosphors really helps… Anyway, thanks for working on this important subject; decisions, not displays!

    1. M Anand avatar
      M Anand

      For people to appreciate the above point.

      Pl see this webpage on a new generation mobile like One Plus 6T and above. You will notice that even the third picture is not purple, it still shows distinct blue and red colours.

  10. David Barnes avatar
    David Barnes

    You have it right. Human visual system latency allows discrete images to be integrated at video frame rates. That is why LCD makers had to increase frame rates to match the perceived performance of CRT, Plasma or (now) OLED. Phosphor-based technologies produce fast exponential decays while LCD produce square-wave-like transitions. The faster, smoother decay of phosphors really helps… Anyway, thanks for working on this important subject; decisions, not displays!

    1. M Anand avatar
      M Anand

      For people to appreciate the above point.

      Pl see this webpage on a new generation mobile like One Plus 6T and above. You will notice that even the third picture is not purple, it still shows distinct blue and red colours.

  11. João avatar
    João

    I think it’s implicit that the probability in the guardian article refers to ensemble probability. Why can’t it be used as an ensemble probability?

    1. Ole Peters avatar
      Ole Peters

      I agree with you – the Guardian statement is one about the ensemble. The error would be to interpret it temporally.

  12. João avatar
    João

    I think it’s implicit that the probability in the guardian article refers to ensemble probability. Why can’t it be used as an ensemble probability?

  13. Luigi avatar
    Luigi

    Thank you for your eye-opening theory, it’s such a transformative tool! I know its point is, to advise against confusing ensemble- and time-average. But I see that it could also explain what happens when we use a model. (I’m just a biology student, maybe I’m completely wrong)

    Applying your model (or language), the problem of using any model is, that we are naturally ignorant about an over-ensemble A* from which our reality (which is the base of human logic) is a path itself, including all its ensembles you can observe in it.

    For example: All Investors on earth are an ensemble A, but there is a over-ensemble A* with alternative realities “not R” (including alternative elements from which you can generate ensembles), which makes >>our particular reality<< R with the ensemble A, an element (or path in) of A*.

    This could be seen as a highly synthetic artificial ensemble, but this is exactly the situation when we use a model.

    An obvious case: Right now, a certain stock-to-flow model has gained popularity and people think it predicts the future price of scarce assets like gold. What it does, is ex-ante "post-dicting". The back-test shows, that the model would have been predictive, if the model was used. With the essential condition, that if and only IF using it, would not have changed the course of history, in order for the model to be predictive in the first place. …

    so implicitly we synthesize an artificial over-ensemble without knowing it. Now your model indicates a pathological state like a fallacy and it connects ergodic-thinking to other related fallacies: (like a tautologic problem which is the observer selection bias or the hated "anthropic principle" implying a survivorship bias… because when we chose a model (any model) we decide against all the alternatives in A*) … is there a rule or a cap in ergodic-theory which defines our reality as the top-reality? so that fractal Matryoshka dolls like the one above are not allowed?

    1. Yavor Stefanov avatar
      Yavor Stefanov

      You mean Bitcoin, don’t you? You also have to account for the self-fulfilling part of the strategies of market participants. The model becomes “a meme” and, thanks to powerful coordination mechanisms like CT (CryptoTwitter), exhibits manifest destiny.

  14. Luigi avatar
    Luigi

    Thank you for your eye-opening theory, it’s such a transformative tool! I know its point is, to advise against confusing ensemble- and time-average. But I see that it could also explain what happens when we use a model. (I’m just a biology student, maybe I’m completely wrong)

    Applying your model (or language), the problem of using any model is, that we are naturally ignorant about an over-ensemble A* from which our reality (which is the base of human logic) is a path itself, including all its ensembles you can observe in it.

    For example: All Investors on earth are an ensemble A, but there is a over-ensemble A* with alternative realities “not R” (including alternative elements from which you can generate ensembles), which makes >>our particular reality<< R with the ensemble A, an element (or path in) of A*.

    This could be seen as a highly synthetic artificial ensemble, but this is exactly the situation when we use a model.

    An obvious case: Right now, a certain stock-to-flow model has gained popularity and people think it predicts the future price of scarce assets like gold. What it does, is ex-ante "post-dicting". The back-test shows, that the model would have been predictive, if the model was used. With the essential condition, that if and only IF using it, would not have changed the course of history, in order for the model to be predictive in the first place. …

    so implicitly we synthesize an artificial over-ensemble without knowing it. Now your model indicates a pathological state like a fallacy and it connects ergodic-thinking to other related fallacies: (like a tautologic problem which is the observer selection bias or the hated "anthropic principle" implying a survivorship bias… because when we chose a model (any model) we decide against all the alternatives in A*) … is there a rule or a cap in ergodic-theory which defines our reality as the top-reality? so that fractal Matryoshka dolls like the one above are not allowed?

    1. Yavor Stefanov avatar
      Yavor Stefanov

      You mean Bitcoin, don’t you? You also have to account for the self-fulfilling part of the strategies of market participants. The model becomes “a meme” and, thanks to powerful coordination mechanisms like CT (CryptoTwitter), exhibits manifest destiny.

  15. Neil avatar
    Neil

    Thanks for this Ole. You have explained very well the reason why the Guardian’s wording is not as clear as it should be.

    “people’s experiences of the penal system are best not talked about in probabilistic terms”.

    What would be a better way for the Guardian to report these figures?

    1. Ole Peters avatar
      Ole Peters

      Good question. It’s been a while since I read the article. As far as I remember, it was possible to extract from the article what was actually meant, which is good.

      But the headline was dangerously sloppy. The points are too nuanced to capture in a headline, so perhaps that should be boring and neutral “UK incaceration rates published” or something like that.

      I’m clearly not in the newspaper business…

  16. Neil avatar
    Neil

    Thanks for this Ole. You have explained very well the reason why the Guardian’s wording is not as clear as it should be.

    “people’s experiences of the penal system are best not talked about in probabilistic terms”.

    What would be a better way for the Guardian to report these figures?

    1. Ole Peters avatar
      Ole Peters

      Good question. It’s been a while since I read the article. As far as I remember, it was possible to extract from the article what was actually meant, which is good.

      But the headline was dangerously sloppy. The points are too nuanced to capture in a headline, so perhaps that should be boring and neutral “UK incaceration rates published” or something like that.

      I’m clearly not in the newspaper business…

  17. Lance avatar
    Lance

    Hi Ole, so to help me understand – is ensemble similar to taking a snapshot in time of a data set vs a timeseries of a data set? And ergodicity is when the snapshot and timeseries datasets are the same? Thanks.

    1. Ole Peters avatar
      Ole Peters

      That’s approximately correct. Technically, there are some infinities and transients involved, but I think your characterization captures the spirit of it.

  18. Lance avatar
    Lance

    Hi Ole, so to help me understand – is ensemble similar to taking a snapshot in time of a data set vs a timeseries of a data set? And ergodicity is when the snapshot and timeseries datasets are the same? Thanks.

    1. Ole Peters avatar
      Ole Peters

      That’s approximately correct. Technically, there are some infinities and transients involved, but I think your characterization captures the spirit of it.

  19. Ergodicity: the most over-looked assumption – Neurabites

    […] (iii) Colour perception. Our brains make visual measurements every 20 milliseconds. So if we switch colours any faster than that, the brain will just perceive a time-average which converges with the ensemble-average of the two colours physically mixed. Source: Ergodicity, jail, and time scales; Ergodicity Economics […]

  20. Ergodicity: the most over-looked assumption – Neurabites

    […] (iii) Colour perception. Our brains make visual measurements every 20 milliseconds. So if we switch colours any faster than that, the brain will just perceive a time-average which converges with the ensemble-average of the two colours physically mixed. Source: Ergodicity, jail, and time scales; Ergodicity Economics […]

  21. rinijose avatar
    rinijose

    Perhaps this is a language issue, but any native English reader will look at the headline and understand that it mean: either 1) A randomly selected Black teen is 9x more likely to have been in jail (at least once) than a randomly selected White teen; or 2) “About nine in every 10,000 young black people in the general population were locked up in young offender institutions, secure training centres or secure children’s homes in England and Wales in 2015-16. This compared with one in every 10,000 of those from white ethnic backgrounds”.

    No one would think it refers to duration.
    Am I missing something here? This post seems to introduce a confusion that doesn’t exist, and then to somewhat address that confusion.

  22. rinijose avatar
    rinijose

    Perhaps this is a language issue, but any native English reader will look at the headline and understand that it mean: either 1) A randomly selected Black teen is 9x more likely to have been in jail (at least once) than a randomly selected White teen; or 2) “About nine in every 10,000 young black people in the general population were locked up in young offender institutions, secure training centres or secure children’s homes in England and Wales in 2015-16. This compared with one in every 10,000 of those from white ethnic backgrounds”.

    No one would think it refers to duration.
    Am I missing something here? This post seems to introduce a confusion that doesn’t exist, and then to somewhat address that confusion.

Leave a Reply

Your email address will not be published. Required fields are marked *

EE2024 Conference

Textbook