Probability weighting and Ergodicity Economics

6 March 2020

One key observation that helped launch the field of behavioral economics into stardom is called probability weighting: a human cognitive bias to assign higher probabilities to extreme events than … well, than what? Than what someone else thinks the probabilities should be. Below, I will present a very simple mechanistic explanation, most of all for the iconic probability weighting figure in (Tversky and Kahneman, 1992). The result is a now familiar theme: (behavioral) economics expresses a more or less robust observation in psychological terms, as a persistent cognitive error. Ergodicity economics explains the same observation mechanistically, and as perfectly rational behavior.

1 Definitions and key observation

I won’t go into how probability weighting is established empirically. Instead, I’ll jump into its definition and then mention some caveats.

Probability weighting: People tend to treat extreme events as though they had higher probabilities than they actually do, and (necessarily because of normalization) common events as if they had lower probabilities than they actually do.

Before going any further: what probabilities are, and even whether they have any actual physical meaning at all, is disputed. They are certainly not directly observable, we can’t touch, taste, smell, see, or hear them. Maybe it’s uncontroversial to say they are parameters in models of ignorance. Consequently, when we make statements about weighting, or misperceiving, probabilities we will always be on shaky ground.

To keep the discussion as concrete as possible, let’s use a specific notion of probability.

Temporal frequentist probability: The probability of an event is the relative amount of time in which the event occurs, in a long time series.

For example, we could say “the probability of a traffic light being green is 40%.” Of course we don’t have to describe traffic lights probabilistically if we know or control their algorithms. But you can imagine situations where we have no such knowledge or control. If we were to say “the probability of rain falling somewhere in London between 3pm and 4pm on a given day in May is 10%” — we would mean that we’d looked at a long time series of days in Mays from the past and found that in 10% of the periods from 3pm to 4pm it had rained somewhere in London.

I’ve said what I mean by probability weighting, and what I mean by probability. Two more bits of nomenclature.

I will refer to an experimenter, or scientist, or observer, as a Disinterested Observer (DO); and to a test subject, or observed person as a Decision Maker (DM). The DO is not directly affected by the DM’s decisions, but the DM is, of course.
I will refer to the probability the DO uses (and possibly controls) in his model by the word “probability,” expressed as a probability density function (PDF), $p(x)$ ; and to the probabilities that best describe the DM’s decisions by the term “decision weights,” expressed as a PDF, [/katex] w(x)[/katex].

Probability weighting, neatly summarized by Barberis (2013) can be expressed as a mapping of probabilities $p$ into decision weights $w$ , a simple function $w(p)$ . We could look at these functions directly, but in the literature it’s more common to look at cumulative density functions (CDFs) instead. So we’ll look at the CDF for $p(x)$ , which is $F_p(x)=\int_{-\infty}^{x} p(s) ds$ and the CDF for $w(x)$ , which is $F_w(x)=\int_{-\infty}^{x} w(s) ds$ .

Fig. 1 is copied from Tversky and Kahneman (1992): an inverse-S curve describes the mapping between the cumulatives.

TK1992

2 Mechanistic models that generate these observations

Let’s list some mechanistic models that predict this behavior. The key common feature is that the DM’s model will have extra uncertainty, beyond what the DO accounts for in his model. Behavioral economics assumes that the DO knows “the truth,” and the DM is cognitively biased and cannot see this truth. We will be agnostic: the DO and DM have different models. Whether one, both, or neither is “correct” is irrelevant for explaining the observation in Fig.1 We’re just looking for good reasons for the difference.

2.1 DM estimates probabilities

How can a DM know the probability of anything? In the real world, the only way to find the probability as we have defined it — relative frequency in time — is to look at time series and count: how often was the traffic light green? How often did it rain between 3pm and 4pm in London in May etc.

The result is a count: in $n=68$ out of $N=680$ observations the event occurred. The best estimate for the probability of the event is then $68/680=10\%$ . But we know a little more than that: counts of events are usually modeled as Poisson processes — it’s the null model that assumes no correlation between events, a common baseline. In this null model, the uncertainty in a count goes as $\sqrt{n}$ .

A DO faced with these statistics is quite likely to put into his model the most likely value, $10\%$ . A DM, on the other hand, is likely to take into account the uncertainty in the count in a conservative way. It’s not good to be caught off guard, so let’s assume the DM adds to all probabilities one standard error, so that

Eq. 1 $w(x)=\frac{1}{c}\left[p(x)+\sqrt{p(x)}\right]$ ,

where $c$ ensures normalization, $c=\int_{-\infty}^{+\infty} \left[ p(x) + \sqrt{p(x)}\right] dx$ .

From here it’s just handle-cranking.

specify the DO’s model, $p(x)$
specify the DM’s model, $w(x)$
integrate to find $latex F_p(x)$ and $latex F_w(x)$
plot $F_w$ vs. $F_p$

Fig.2 shows what happens for a Gaussian distribution and for a fat-tailed Student-t.

Gauss_Student_Poisson_error

Generally, probability weighting is a mismatch between the models of the DO and the DM. The canonical inverse-S shape represents the precautionary principle: it’s best for the DM to err on the side of caution, whereas the DO will often use most likely probabilities.

2.2 DO confuses ensemble-average and time-average growth

Incidentally, neglecting the detrimental effects of fluctuations (i.e. neglecting the precautionary principle) is one direct consequence of the ergodicity problem in economics: a DO who models people as expectation-value optimizers rather than time-average growth optimizers will find the same type of “probability weighting,” which should really just be seen as an empirical falsification of the DO’s model. The prevalence of these curves could therefore be interpreted as evidence for the importance of adopting ergodicity economics. See also this blog post by Ihor Kendiukhov.

2.3 DM assumes a broader range of outcomes

Recognizing probability weighting as a simple mismatch between the model of the DO and the model of the DM predicts all sorts of probability weighting curves. Now we know what they mean, we can make predictions and test them. Fig.3 is the result of the DO using a Gaussian distribution, and the DM also using a Gaussian distribution, but one with a higher mean and a higher variance. It looks strikingly similar to the observations of Tverskey and Kahneman (1992).

The inverse-S shape arises whenever a DM (cautiously) assumes a larger range of plausible outcomes than the DO. This happens whenever the DM has additional sources of uncertainty — did he understand the experiment? Does he trust the DO? Taleb (2019) calls the assumption by the DO that the DM will use probabilities as specified in an experiment the “ludic fallacy:” what may seem sensible to the designer of a game-like experiment may seem less so to a test subject.

Gauss_scale_location_both_KT

3 Conclusion

Ergodicity economics carefully considers the situation of the DM as living along a time line. Probability weighting then appears not as a cognitive bias but as an aspect of sensible behavior across time. Unlike the vague postulate of a bias, it can make specific predictions: it’s often sensible for the DM to assume a larger variance than the DO, but not always. Also, a DO may be aware of the true situation of the DM, and both may use the same model, in which case there won’t be a systematic effect. In other words, the ergodicity-economics conceptualization adds clarity to ongoing research.

Ergodicity economics urges us to consider how real actors have to operate within time, not across an ensemble where probabilities make no reference to time. The precautionary principle is one consequence (because fluctuations are harmful over time); having to estimate probabilities from time series is another. Assuming a perfectly informed, perfectly rational DM, ergodicity economics predicts the observations that in behavioral economics are usually presented as a misperception of the world. Ergodicity economics thus suggests once again that economics jumps to psychological explanations too soon and without need.

What are we to make of probability weighting, then? Just like in the case of utility, I don’t recommend using the concept at all. “Probability weighting” is an extremely complicated way of expressing a rule known by all living things. A bit of surfing wisdom: if in doubt, don’t go out.

p.s. Alex Adamou, Mark Kirstein, Yonatan Berman and I have put up a draft manuscript, which you’re invited to comment on: https://researchers.one/articles/20.04.00012.