I won’t go into how probability weighting is established empirically. Instead, I’ll jump into its definition and then mention some caveats.

**Probability weighting:*** People tend to treat extreme events as though they had higher probabilities than they actually do, and (necessarily because of normalization) common events as if they had lower probabilities than they actually do.*

Before going any further: what probabilities are, and even whether they have any actual physical meaning at all, is disputed. They are certainly not directly observable, we can’t touch, taste, smell, see, or hear them. Maybe it’s uncontroversial to say they are parameters in models of ignorance. Consequently, when we make statements about weighting, or misperceiving, probabilities we will always be on shaky ground.

To keep the discussion as concrete as possible, let’s use a specific notion of probability.

**Temporal frequentist probability:*** The probability of an event is the relative amount of time in which the event occurs, in a long time series.*

For example, we could say “the probability of a traffic light being green is 40%.” Of course we don’t have to describe traffic lights probabilistically if we know or control their algorithms. But you can imagine situations where we have no such knowledge or control. If we were to say “the probability of rain falling somewhere in London between 3pm and 4pm on a given day in May is 10%” — we would mean that we’d looked at a long time series of days in Mays from the past and found that in 10% of the periods from 3pm to 4pm it had rained somewhere in London.

I’ve said what I mean by probability weighting, and what I mean by probability. Two more bits of nomenclature.

- I will refer to an experimenter, or scientist, or observer, as a
**Disinterested Observer (DO)**; and to a test subject, or observed person as a**Decision Maker (DM)**. The DO is not directly affected by the DM’s decisions, but the DM is, of course. - I will refer to the probability the DO uses (and possibly controls) in his model by the word “
**probability**,” expressed as a probability density function (PDF), ; and to the probabilities that best describe the DM’s decisions by the term “**decision weights**,” expressed as a PDF, .

Probability weighting, neatly summarized by Barberis (2013) can be expressed as a mapping of probabilities into decision weights , a simple function . We could look at these functions directly, but in the literature it’s more common to look at cumulative density functions (CDFs) instead. So we’ll look at the CDF for , which is and the CDF for , which is .

Fig. 1 is copied from Tversky and Kahneman (1992): an inverse-S curve describes the mapping between the cumulatives.

Let’s list some mechanistic models that predict this behavior. The key common feature is that the DM’s model will have extra uncertainty, beyond what the DO accounts for in his model. Behavioral economics assumes that the DO knows “the truth,” and the DM is cognitively biased and cannot see this truth. We will be agnostic: the DO and DM have different models. Whether one, both, or neither is “correct” is irrelevant for explaining the observation in Fig.1 We’re just looking for good reasons for the difference.

How can a DM know the probability of anything? In the real world, the only way to find the probability as we have defined it — relative frequency in time — is to look at time series and count: how often was the traffic light green? How often did it rain between 3pm and 4pm in London in May etc.

The result is a count: in out of observations the event occurred. The best estimate for the probability of the event is then . But we know a little more than that: counts of events are usually modeled as Poisson processes — it’s the null model that assumes no correlation between events, a common baseline. In this null model, the uncertainty in a count goes as .

A DO faced with these statistics is quite likely to put into his model the most likely value, . A DM, on the other hand, is likely to take into account the uncertainty in the count in a conservative way. It’s not good to be caught off guard, so let’s assume the DM adds to all probabilities one standard error, so that

Eq. 1 ,

where ensures normalization, .

From here it’s just handle-cranking.

- specify the DO’s model,
- specify the DM’s model,
- integrate to find and
- plot vs.

Fig.2 shows what happens for a Gaussian distribution and for a fat-tailed Student-t.

Generally, *probability weighting is a mismatch between the models of the DO and the DM.* The canonical inverse-S shape represents the precautionary principle: it’s best for the DM to err on the side of caution, whereas the DO will often use most likely probabilities.

Incidentally, neglecting the detrimental effects of fluctuations (i.e. neglecting the precautionary principle) is one direct consequence of the ergodicity problem in economics: a DO who models people as expectation-value optimizers rather than time-average growth optimizers will find the same type of “probability weighting,” which should really just be seen as an empirical falsification of the DO’s model. The prevalence of these curves could therefore be interpreted as evidence for the importance of adopting ergodicity economics. See also this blog post by Ihor Kendiukhov.

Recognizing probability weighting as a simple mismatch between the model of the DO and the model of the DM predicts all sorts of probability weighting curves. Now we know what they mean, we can make predictions and test them. Fig.3 is the result of the DO using a Gaussian distribution, and the DM also using a Gaussian distribution, but one with a higher mean and a higher variance. It looks strikingly similar to the observations of Tverskey and Kahneman (1992).

The inverse-S shape arises whenever a DM (cautiously) assumes a larger range of plausible outcomes than the DO. This happens whenever the DM has additional sources of uncertainty — did he understand the experiment? Does he trust the DO? Taleb (2019) calls the assumption by the DO that the DM will use probabilities as specified in an experiment the “ludic fallacy:” what may seem sensible to the designer of a game-like experiment may seem less so to a test subject.

Ergodicity economics carefully considers the situation of the DM as living along a time line. Probability weighting then appears not as a cognitive bias but as an aspect of sensible behavior across time. Unlike the vague postulate of a bias, it can make specific predictions: it’s often sensible for the DM to assume a larger variance than the DO, but not always. Also, a DO may be aware of the true situation of the DM, and both may use the same model, in which case there won’t be a systematic effect. In other words, the ergodicity-economics conceptualization adds clarity to ongoing research.

Ergodicity economics urges us to consider how real actors have to operate within time, not across an ensemble where probabilities make no reference to time. The precautionary principle is one consequence (because fluctuations are harmful over time); having to estimate probabilities from time series is another. Assuming a perfectly informed, perfectly rational DM, ergodicity economics predicts the observations that in behavioral economics are usually presented as a misperception of the world. Ergodicity economics thus suggests once again that economics jumps to psychological explanations too soon and without need.

What are we to make of probability weighting, then? Just like in the case of utility, I don’t recommend using the concept at all. “Probability weighting” is an extremely complicated way of expressing a rule known by all living things. A bit of surfing wisdom: if in doubt, don’t go out.

p.s. Alex Adamou, Mark Kirstein, Yonatan Berman and I have put up a draft manuscript, which you’re invited to comment on: https://researchers.one/articles/20.04.00012.

]]>

In 2016 Alex Adamou, and I published a paper in a journal. It went through a thorough peer-review process, as such papers do. At the end of the process it contained less of what we’d wanted to say, and more things we hadn’t wanted to say. You can read it here, or you can read on.

GDP has been criticized for many reasons. For instance, when a natural disaster strikes, or when a part of a city is destroyed with bombs, the re-building that (hopefully) follows is economic activity, which boosts GDP. In order to optimize GDP, we can just destroy stuff (up to a point).

Another angle of critique is that GDP just measures how much money people spend but not how they’re actually doing. We could all be quite “rich” in meaningless money-terms but spiritually empty, in a clockwork economy that has forgotten that money is at best a means to an end: it should enable well-being. But often it’s a simple irrelevance, and in many cases a destructive curse that distracts us from the human emptiness it helps create.

When economists are confronted with critiques of GDP, different responses are observed. Some say “yes, and we’ve been working on better measures, like the Human Development Index.” Others say “yes, GDP is a catastrophe — it was designed to measure how fast we could build tanks to help end Germany’s Nazi terror. But soon after that had been achieved, politicians started to use it as a measure of economic well-being, which it really isn’t.” This brings to mind the good advice “measure what you value because you will value what you measure.” Finally, another group of economists will insist that GDP is not a problem at all because no one uses it. I was quite surprised by this last response: love it or loathe it, as far as I can tell GDP is the headline figure of choice for the vast majority of politicians, journalists, and economic analysts.

So much for a nod to the sizeable debate around GDP. I won’t go into other people’s work any further because they’re the experts, of course. Instead, I’ll ask: what does ergodicity economics have to say about GDP?

You may recall that ergodicity economics asks the question whether what happens to the aggregate (the ensemble average) reflects what happens to the individual (over time). A powerful model for addressing this question in economics is geometric Brownian motion (something I’ve previously called the equation of life). In this model, a quantity, , grows by a Gaussian-distributed factor, , in each small time step, ,

(1)

For what follows, it’s not important whether this equation accurately describes income, but to give ourselves a concrete mental model, let’s assume that it does. Let’s also assume that there are a large number of people, , who all receive an income that grows according to Eq. 1 — in good years it will go up for an individual, in bad years it can also go down. Finally, let’s assume that GDP at time is just the sum of all these individual incomes at time . We will work with GDP per capita and define

(2)

What we read in the news so often is that GDP went up by 2.3%, and everyone cheers. But during one of the many recent British political campaigns, an observant member of the public explained to a journalist who had told him that one thing or another would be good for GDP: “yes, but that’s not my GDP.”

The journalist duly reported this, and there was much merriment about the lack of understanding of basic economics by the general public.

Ergodicity economics says: the man was not quite so wrong. The UK’s GDP is not his GDP. We’re working with GDP per capita, so this isn’t about the fact that the man doesn’t own the UK. His statement is true in a less trivial sense, too.

When we measure GDP growth (in our simple model, but essentially in real life too), we compute the growth rate of Eq. 2,

(3)

Eq.3 is the growth rate of the mean income. It has an interesting property: it’s invariant under redistribution. I can shuffle the income around the population any way I like — is unaffected by that.

For example, let’s say at t=2020 everyone earns $50,000 per year, and at t=2021, everyone except one person earns nothing, with that one person earning $51,500 per year, the exponential GDP growth rate would be 3% per year. The country is destroyed, cannibalism has broken out, the trees in the parks have been chopped down for fire wood. But GDP looks fine.

Why? The reason is that GDP is an ensemble average (over the finite ensemble that is the population). Ergodicity economics tells us that such averages don’t reflect what happens to the individual. What (typically) happens to the individual is reflected by the time-average growth rate, and that’s where DDP comes in.

Choosing how to average something means giving weights to chosen entities. Eq.3 gives equal weight to each dollar. It is a *plutocratic *average, meaning each dollar has the same power over the value of this measure.

Now what if we had computed the time-average growth of income instead? In that measure, we imagine that an individual experiences in sequence all the changes in income that happen to each individual in the population.

(4)

This procedure gives equal weight to each individual, not to each dollar. It’s the average of the individual income growth rates, not the growth rate of the average income. It has an interesting “no-person-left-behind” property. If even just one individual’s income drops to zero, the whole average is ruined, Clearly, this measure is not invariant under re-shuffling of income. And whereas GDP growth is a *plutocratic* measure of growth, DDP growth is a *democratic* measure: each member of the demos has the same power over the value of this measure. For a given value of GDP growth, DDP growth is higher when the less wealthy are catching up with the wealthy, and slower when the wealthy are pulling ahead.

The different statistics — GDP and DDP — are illustrated in Fig.1.

If I understood his tweet correctly, then Gabriel Zucman recently proposed to call DDP growth “people’s growth” — at least the basic idea is very similar, so I’ll post the figure from his tweet here.

As is often the case, with a little research we can relate concepts that arise naturally in ergodicity economics to concepts that exist somewhere in the economics literature. Let’s use to define DDP. It is the rate at which something grows, so we can define DDP as the thing that grows at :

(5)

Substituting from Eq.4 (do it — it’s a pleasing exercise!), we find that DDP is the geometric mean income,

(6) .

Under the income dynamics of Eq.1, GDP grows faster than DDP, meaning that the average income grows at a rate that’s greater than the time-average growth rate of income. Or put differently again: mean income grows faster than typical income.

That is only possible if income inequality increases: ever fewer a-typically income-rich individuals must account for the difference in growth rates as time goes by.

This in turn suggests a measure of inequality: the difference in growth rates is the growth rate of inequality:

(7)

Integrating and re-arranging (another satisfying exercise), we find that the inequality measure J is what’s called the mean-logarithmic deviation (MLD)

(8)

The economist Henri Theil identified this quantity as a good measure of income inequality, and it is also know as Theil’s second inequality index. Theil derived it on the basis of information theory, rather than using the dynamic arguments I have presented here.

Amartya Sen said of Theil’s work: “But the fact remains that [the Theil index] is an arbitrary formula, and [..] not a measure that is exactly overflowing with intuitive sense.”

(To be precise: Theil proposed two inequality measures; Eq.8 is his second index, whereas Sen commented on his first index, which is the same as the second except for a weighting that prevents the divergence for zero incomes).

Ergodicity economics can provide the lacking intuitive sense: this inequality measure is the difference between average and typical income. It does the right thing: Fig.3, produced by Yonatan Berman at the London Mathematical Laboratory, is a comparison between J and another commonly used measure of income inequality.

One key conflict in economic affairs is that between the individual and the collective — studying the relationship between these two perspectives means concerning oneself with the ergodicity problem. Some, it seems, believe that it is beneficial to let the individual tap into the strength of the collective; others, apparently, believe that the collective must not interfere with the uniqueness of the individual. Few believe that either extreme is desirable, and ergodicity economics provides a good language to speak about the trade-offs involved in moving towards greater or lesser collectivism.

]]>

I’ll tell you a secret: when I read a statistical statement I often wonder whether it’s a temporal statement or an ensemble statement. Do you do that too? Take this headline for example: “Young black people nine times more likely to be jailed than young white people” from The Guardian. I won’t talk about why this might be, that’s not the point of this post. My point, as usual, is about time versus ensembles. If you read the Guardian article, you’ll find that the headline is an ensemble statement. It’s supposed to convey that the proportion of the ensemble of black people under the age of 18 in the UK who were in jail (in a broad sense) when statistics were collected was nine times higher (0.09%) than the corresponding figure for white people (0.01%).

It would be a temporal statement if it meant that your friend David, who is black, spent around 142 hours in jail before he turned 18, whereas your friend Luigi, who is white, spent only around 16 hours in jail before he turned 18. It obviously doesn’t mean that, but — perhaps less obviously — it also doesn’t mean that when David was born he was in any meaningful way 9 times more likely than Luigi to be in jail before they turned 18 — what happens along a single life path over time is not reflected by these aggregate figures.

The word “likely” does not specify whether the probabilities it reflects are relative frequencies in an ensemble or in time. Such unspecific language is problematic when only one interpretation is correct. So: people’s experiences of the penal system are best not talked about in probabilistic terms. Let’s generalize this recommendation: we shouldn’t talk about anything in probabilistic terms unless we’re convinced that the time and ensemble interpretations of what we’re saying are equivalent. Nassim Taleb, in his latest book, put it laconically as “no probability without ergodicity.”

It’s not just this one example — lots of statistical statements are phrased in probabilistic language, with the implicit (and often false) assumption that ensemble-interpretations and temporal interpretations of that language will be equivalent. That assumption is called the “ergodic hypothesis.” In the guardian example, just reading the headline and then wrongly assuming ergodicity can quite easily lead to horrendous misinterpretations, so let’s watch our language, seriously.

*“What do you want with that baseball bat? I told you I’ll get you your money as time goes to infinity!” *

… will not keep the mob off your back for long, even if you’re telling the truth.

The ergodic hypothesis is designed for so-called “fast” systems, meaning for systems where each trajectory (each person) explores all of its possible states (jail or no jail) over time scales that are short compared to the time scale of measurement. In our example, this would be the case if David and Luigi were each thrown in jail twice a month for a few minutes. Since we only care about where they spent the first 18 years of their lives, saying Luigi spent 0.01% of his time in jail would be good enough (if that were true). Of course that’s not true in this example — relax, your friends David and Luigi don’t even know what a jail looks like.

In reality, instead of David and Luigi rotating in and out of jail all the time, there are a small number of people who spend far more than their fair share of time behind bars (the word “fair,” as often in a probabilistic context, has various meanings here).

While pondering the fate of David and Luigi, it occurred to me that I should produce an example of an ergodic system — one where it’s ok to switch time and ensemble perspectives, just so we all know what that means. Almost nothing interesting is well modeled as ergodic, so the example will be boring. Here it goes: your brain makes visual measurements on a time scale of about 20 milliseconds — if I switch between two images more slowly than this, you will notice the change. If I switch much faster, your brain starts averaging over time, and you will perceive something constant in time that contains both images. Aside: I don’t claim to know anything about brains, I’m just guessing this time scale because computer screens used to refresh their images at roughly 50Hz (every 20 milliseconds) and seemed to flicker, while faster screens are nicer.

In Fig. 1 I’ve created four gifs, switching between red and blue at increasing frequency. In the first two we can clearly perceive the red and blue states as distinct — the characteristic time scale of the dynamic is slower than that of the measurement (our vision). The third gif flickers a little, but — at least to my slow brain — it seems kind of purple. That’s because the characteristic time scale of the dynamic is now similar to, or has surpassed, that of the measurement. The final gif is just purple — this is just the static color composed of red and blue with equal weight (RGB code 880088), and I’ve marked it 0 seconds because it’s like switching infinitely fast.

Fig.1: switching color between red and blue at different time scales.

For the first two images saying “this is a purple square” leaves out information that’s relevant on the time scale of measurement. If we call the third square “purple” we’re also replacing a dynamic description “it switches every 20 milliseconds between red and blue” with an average (ensemble or time) description. But because our brains are so slow, to us the square is meaningfully both blue and red “simultaneously” and the description “purple” is beginning to capture what we need to know.

Long story short: probabilistic descriptions are dangerous territory. They may be ok for a system where

- any single trajectory through time explores everything that might happen and
- it does that so fast that, on the time scale we’re interested in, it’s as if everything is happening simultaneously.

For David and Luigi that’s obviously not the case, for the red and blue squares it can be.

]]>

As we continue to re-develop economic theory by asking the ergodicity question, things are becoming simpler and clearer, and growth rates have emerged as key mathematical objects. Computing the right growth rate in a setting without uncertainty, it turns out, produces discounting. People who optimize the right growth rate behave as if they were discounting payments exponentially, or hyperbolically, or whichever way the dynamic dictates.

In the setting of decisions under uncertainty, optimizing appropriate growth rates can be mapped to Laplace’s expected utility theory, (which is worked out in Peters&Gell-Mann(2016) and inspired the Copenhagen experiment). The growth rate contains a function that we can identify with the utility function in Laplace’s theory (not in Bernoulli’s expected utility theory, which is inconsistent).

In other words: ergodicity economics unifies different branches of decision theory (including intertemporal discounting and expected utility theory) into one concept: growth rate maximization.

Hence the question:

*How do we determine the appropriate growth rate for a given dynamic?*

We all know at least two types of growth rates, probably more. Below, we’ll develop what a growth rate is, really, by going through two well-known examples, spotting similarities, and then generalizing.

Our first example of a growth rate is the simple rate of change.

(1)

and our second example will be the exponential growth rate

(2)

We use Eq.(1) when something grows linearly, according to

(3) .

The rate of change, Eq.1, is then a good growth rate whose value is . But how do we know that? What is it about the dynamic, Eq.(3), that makes Eq.(1) the “right” growth rate? Or put differently: why not state the exponential growth rate, Eq.(2), when someone asks how fast is growing?

Answer: for the additive dynamic, Eq.(3), the growth rate in Eq.(1) has a special and very useful property: it is independent of time — no matter at what I measure Eq.(1), I always get . Because of this time-independence, evaluating Eq.(1) is a useful way to say how fast the process is.

Actually, let’s write the dynamic, Eq.(3), as a differential equation.

(4)

or equivalently, because depends neither on nor on ,

(5)

This second way of writing tells us that the growth rate is really a sort of clock speed. There’s no difference between rescaling and rescaling (by the same factor). Intriguing.

We make a mental note:* the growth rate is a clock speed.*

Let’s dig a little deeper here. What kind of clock speed are we talking about? What’s a clock speed anyway?

Or: what’s a clock? A clock is a process that we believe does something repeatedly at regular intervals. It lets us measure time by counting the repetitions. By convention, after 9,192,631,770 cycles of the radiation produced by the transition between two levels of the caesium 133 atom we say “one second has elapsed.” That’s just something we’ve agreed on. But any other thing that does something regularly would work as a clock – the Earth spinning around its axis etc.

When we say “the growth rate of the process is ,” we mean that it advances units on the process-scale (here ) in one standard time unit (in finance we often choose one year as the unit, Earth going round the Sun). So it’s a conversion factor between the time scales of a standard clock and the process clock.

Of course, a clock is no good if it systematically speeds up or slows down. For processes other than additive growth we have to be quite careful before we can use them as clocks, i.e. before we can state their growth rates.

Now what about the exponential growth rate, Eq.(2)? Let’s use what we’ve just learned and *derive *the process for which Eq.(2) is a good growth rate. We expect to find exponential growth.

We require that Eq.(2) yield a constant, let’s call that again, irrespective of when we measure it.

(6) ,

or

(7) ,

or indeed, in differential form, and revealing that again ** the growth rate is a clock speed:** plays the same role as ,

(8) .

This differential equation can be directly integrated and has the solution

(9) .

We solve for the dynamic by writing the log difference as a fraction

(10) ,

and exponentiating

(11)

As expected, we find that the exponential growth rate, Eq.(2), is the appropriate growth rate (meaning time-independent) for *exponential* growth.

In terms of clocks, what just happened is this: we insisted that Eq.(2) be a good definition of a clock speed. That requires it to be constant, meaning that the process has to advance on the logarithmic scale, specified in Eq.(2), by the same amount in every time interval (measured on the standard clock, of course — Earth or caesium).

Now let’s raise the stakes and assume less, namely only that grows according to a dynamic that can be written down as a separable differential equation. We could be even more general, but this is bad enough.

How do we define a growth rate now?

Well, we insist that the thing we’re measuring will be a time-independent rescaling of time, as before. We enforce this by writing down the dynamic in differential form, containing the growth rate as a time rescaling factor. Then we’ll work backwards and solve for :

(12)

(for linear growth would just be , and for exponential growth it would be , but we’re leaving it general). We separate variables in Eq.(12) and integrate the differential equation

(13) ,

and we’ve got what we want, namely the functional form of :

(14)

To make the equation a bit simpler, let’s give the definite integral a name, the letter , so that

(15) .

Then we have

(16)

or

(17) .

There’s a trick to find the growth rate for a given dynamic, without having to solve integrals.

Imagine the growth rate has the numerical value . To keep things clear we’ll introduce a subscript 1 for this rate-1 process:

(18)

The time that passes, in standard time units, between two levels of is then just , i.e. we have . That’s achieved if is the inverse of the dynamic, .

We measure the growth rate by using the actual process as a clock (not the rate-1 process). We take the actual process, generated with whatever the value of the growth rate actually is, , we measure it at two different points in time, and (where time is defined by our standard clock, like that atom), invert it according to , and compare how much time has elapsed on the time scale of the process (which contains ) to how much time has elapsed on our standard clock.

The result is the growth rate.

(19) ,

and we conclude that

The required non-linear transformation is the inverse of the rate-1 process. Nice!

That makes total sense, of course: grows in some wild way, and we just want to know its clock speed . To find that, we have to get rid of the wild growth, i.e. we have to invert the growth — namely we have to undo how time would be transformed in the rate-1 case .

Let’s quickly check this for the additive and multiplicative dynamics, and then try out a different growth to see that everything works out.

For additive dynamics, Eq.(3), we have the linear function . The inverse of the rate-1 process is : we’re just inverting the identity transformation. So we expect to be the identity function, which it is: comparing Eq.(16) to Eq.(1), we have . We’ve dropped here because it makes no differences to .

For multiplicative dynamics — as ever — it’s more interesting. The inverse of the rate-1 process for Eq.11 is . Again, it fits: Eq.(17) and Eq.(2) match if (in the growth rate computation cancels out).

Now a case that’s neither exponential (multiplicative) nor linear (additive).

(20) .

It’s a trivial example, but it shows the mechanics. Without any differential equations, we’ll just find the growth rate from the inverse function of the rate-1 process . That’s

(21) .

According to Eq.(16), ,

which is

(22)

(Nerdy aside: Section 5 shows that the restriction to dynamics of the form in Section 4 amounts to assuming that has a differentiable inverse.)

Beautiful. It all works out. But doesn’t this remind you of something? Of course, a null model of human behavior must be that people maximize the growth rate of their wealth. That means they do different things, depending on the dynamics. Let’s fix for a moment. Under additive dynamics they’ll then optimize , under multiplicative dynamics they’ll optimize , and under general dynamics .

So people optimize the change in a generally non-linear function of wealth… that’s utility theory, and that’s why we called the non-linear transformation . Turns out, this has less to do with your idiosyncratic psychology and more to do with the dynamic to which your wealth is subjected.

I’ll leave the extension of this treatment to a random environment as an exercise. Hint: in a deterministic environment, growth rates are constant in time. In a random environment they are* ergodic *(that’s why at the London Mathematical Laboratory we don’t say “utility function” but “ergodicity mapping”).

You can read more about this in Chapter 2 of our lecture notes (which we’re constantly revising), or in this paper with Alex Adamou: The time interpretation of expected utility theory. *[2019-05-02 addendum: a nice example of a growth process that’s neither linear nor exponential is the body mass of organisms]*

I thank Yonatan Berman and Alex Adamou for headache-inducing discussions about inverse functions and the like.

]]>

The aim of the experiment is to find out whether people change their utility functions in response to a change in dynamics. If this is the case, expected-utility theory is falsified, see **Analysis** section for details.

Both the experiment and this blog post are based on the fact that expected utility theory can be mapped mathematically to time-average growth optimization under the assumption of constant dynamics and fixed gamble duration. This is explained didactically and in detail in Peters and Gell-Mann (2016), with a generalization in Peters and Adamou (2018). The blog post should be self-contained, but these are the references for technical background.

**Gamble:** a gamble is a mathematical object, namely a random variable whose value describes a change in monetary wealth . * Example:* takes the value $2 with 50% probability and -$1 with 50% probability.

**Decision criterion:** a decision criterion is a model of human behavior. It specifies how people evaluate gambles. ** Example: **people maximize the expectation value of the wealth change . But many other examples exist; maximizing the 99th percentile of the distribution of is also a decision criterion.

**Dynamic:** a dynamic specifies how a gamble is repeated. * Example:* multiplicative repetition means that the random factor is repeatedly applied to wealth , where is the wealth before the first round of the gamble. In the example gamble above, with initial wealth , multiplicative repetition means that wealth increases by 40% or decreases by 20% with 50/50 chance in each round.

**Gamble duration:** when evaluating gambles, it is usually necessary to know something about how long they take. The duration of one repetition, specified in the dynamic, is the duration of the gamble, .

I’ll summarize 350 years of behavioral modeling in three *blue sentences*, with inevitable omissions (apologies for those).

It is more or less fair to attribute the first model of human decision-making to Huygens (1657).

*Model 1:
people maximize expectation values of changes in wealth.*

This model doesn’t work, and that was noticed early in the history of economics. A different model was put forward by Bernoulli (1738).

*Model 2, expected utility theory (EUT):
people maximize expectation values of changes in utility of wealth.*

Many tweaks were added to this model, some of them insightful, others less so. I didn’t know much about this when I proposed yet another model (2011).

*Model 3, ergodicity economics (EE):*

* people maximize time-average growth of wealth.*

Model 2 adds to model 1 an arbitrary non-linear utility function, and consequently generates different behavior. For example, model 1 cannot generate risk-averse behavior, whereas model 2 can. When comparing which model is better at describing an observation, model 2 wins — this has to be so because it includes model 1 as a special case (utility function u(x)=x), so it always does at least as well as model 1. Model 2 has been criticized for being insufficiently constrained because the utility function can be chosen freely. Any individual person may have his own utility function, and that function can have any shape.

Models 1 and 3 differ because the expectation value of wealth is not the time-average of wealth. In technical words: wealth is not ergodic. This is reflected in a difference between the growth rate experienced by wealth over a long time and that experienced by the expectation value of wealth. The technical work required to understand this difference was done in the 1850s to 1870s by Maxwell and Boltzmann as they developed statistical mechanics. Model 3 does not allow arbitrary utility functions. It predicts how people will behave, given the dynamic.

This is where the Copenhagen experiment comes in. For a given dynamic of wealth and fixed gamble durations, optimizing time-average growth rates (EE) can be expressed as optimizing the expectation value of a utility function (EUT). For example, under multiplicative wealth dynamics, EE maps onto EUT with a logarithmic utility function. Under additive dynamics, EE maps onto EUT with a linear utility function. EE is conceptually different from EUT — EE optimizes over time, whereas EUT optimizes over an ensemble.

But because EE can be mapped onto EUT, it’s not trivial to design an experiment or an observation where one theory is clearly wrong and the other is clearly right.

EUT allows an arbitrary utility function, which adds flexibility, but also means it’s less falsifiable as a scientific model.

Model 3, EE, is in principle more falsifiable — it predicts functional forms given the wealth dynamics. The trouble here is that we don’t quite know the wealth dynamics. Real wealth dynamics have strong multiplicative elements, and we could expect people to optimize the expectation value of a logarithmic utility function (that would just be a psychological way of saying “optimize growth over time”). Stock market investments are a clear example of multiplicative wealth growth, but people also invest in health, housing, education etc. Nor is it usually clear what monetary wealth means — for instance parts of one’s wealth may be earmarked for future spending and cannot be subjected even to moderate risk. It’s not clear in how far EE would be falsified if, say, observations of real investment decisions turned out to be less than perfectly modelled by optimizing expected logarithmic utility.

Model 3 is a recent challenger. Hints of it can be found throughout the literature, starting even before probability theory was formalized in the 1650s, but we developed it as a proper theory beginning in 2007. It found a home at the London Mathematical Laboratory in 2012, and a handful of people have now spent significant time working on it. Model 2, on the other hand, has been the dominant model of economic theory for about 300 years, and I guess that more than half of all Nobel prizes in economics have been awarded for something to do with model 2.

(The pre-registration of the experiment is available here. The setup follows closely the discussion in Peters and Gell-Mann (2016).)

An obvious thought is this: why not set up an experiment? If it’s so hard to know the exact dynamics of wealth in real life, why not create a controlled game? Give people some gambling money and expose that to different dynamics (multiplicative and additive, say) and see if people’s behavior is described by logarithmic utility in the multiplicative setting and by linear utility in the additive setting?

If test subjects really change utility functions in response to a change in dynamics, then EUT is falsified and EE corroborated. If they don’t, then EE is falsified and EUT corroborated.

I didn’t propose, nor support, such an experiment. My feeling was that test subjects just wouldn’t care enough about the experiment and therefore wouldn’t respond properly to changes in experimental conditions. Having read the detailed description of the setup I’ve changed my mind — I now think this strategy could work. Here’s what was done.

9 different symbols are chosen to signify the following 9 possible factors of wealth change: 0.45, 0.55, 0.67, 0.82, 1, 1.22, 1.5, 1.83, 2.24.

For example like this (left to right, top to bottom):

The symbols are distinct fractal images, chosen because they are easy to remember and have no obvious connotations that could influence behavior.

- Initial gambling wealth is 1,000 kr (about US$110).
- A symbol is shown.
- The new wealth is shown (in this first step 670 kr).

Steps 2 and 3 are repeated for about 50 minutes: a fractal flashes up, then the subject’s new wealth is shown, like below, with time going from left to right.

After 50 minutes of training (with a 2-minute break), the subject has a clear sense of the effect each symbol has on his wealth.

- Initial gambling wealth is the outcome from the passive phase.
- Two pairs of symbols are shown, representing two gambles like this

- The subject chooses a gamble, a) or b). According to that gamble, one symbol is randomly selected with 50/50 chance, and — for a subset of rounds (to keep the wealth range under control) — the corresponding factor is applied to the subject’s gambling wealth.

The procedure is repeated approximately 300 times, with one round lasting about 10 seconds. Because all gamble durations are identical, we don’t need to worry about them, and an EUT treatment maps neatly onto time-average growth optimization. The final gambling wealth is paid out in real money to the subject.

Everything is identical to Day 1, except that the symbols now represent a fixed amount of money by which gambling wealth changes (not a fixed factor). The fixed amounts are: -428 kr, -321 kr, -241 kr, -107 kr, 0 kr, +107 kr, +241 kr, +321 kr, +428 kr.

The order of the additive and multiplicative settings is controlled for — some subjects first do multiplicative, then additive, whereas others do it the other way around.

Here’s the question: does a given test subject, let’s call her Alice, tend to behave according to logarithmic utility on the multiplicative day (as predicted by EE) and according to linear utility on the additive day (as also predicted by EE)? This would flatly falsify EUT because under EUT utility functions are stable in time.

We can infer the utility function that describes Alice’s behavior from the choices made by Alice. In the example above Gamble a) is preferable according to logarithmic utility, although Gamble b) is preferable according to linear utility.

Let’s check that: Gamble a) is a 50/50 chance of factors 0.82 and 1.5, and Gamble b) is a 50/50 chance between factors 0.45 and 2.24.

The expected change in logarithmic utility, , (time-optimal for multiplicative dynamics) is for Gamble a)

and for Gamble b)

.

Thus, Gamble a) is preferable.

Under linear utility, (time-optimal for additive dynamics) we find for Gamble a)

.

and for Gamble b)

.

Thus, in contrast to logarithmic utility, linear utility implies that Gamble b) is preferable.

EE predicts that people on Day 1 (multiplicative) would tend to choose Gamble a) but when faced with the same gamble — the same possible wealth changes — on Day 2 (additive) they would tend to prefer Gamble b). If this behavior is observed, EE is right and EUT is wrong, insofar as such a blunt statement is sensible.

EUT allows Alice any utility function she wants, but whatever that function is, she’s not allowed one on Day 1 and another on Day 2. EUT would allow another test subject, let’s call him Bob, to have a different utility function from that of Alice. But also Bob’s function will have to be the same on Day 1 as on Day 2.

From the point of view of scientific method, this is an interesting point: EUT’s status as a scientific theory rests on utility functions being stable in time. Without this stability EUT would make no statement at all about human behavior — saying that everyone behaves according his own utility function, and that that function may be different for every decision made is a non-statement. EUT would make no predictions, therefore would not be falsifiable and wouldn’t qualify as a scientific theory.

If Alice behaves, as predicted by EE, in accordance with a logarithmic utility function under multiplicative dynamics and according to a linear utility function under additive dynamics, then it’s game over for EUT.

But what if Alice behaves according to logarithmic utility under both types of dynamics? My sense used to be (and still is, though to a lesser extent) that this is quite a likely outcome of this or similar experiments. The reason is that real life, including non-human life, life itself and evolution, is dominated by multiplicative dynamics. I can’t say this too often: life is that which produces more of itself — multiplicative growth. We’ve been conditioned for billions of years to optimize “logarithmic utility,” namely time-average growth in a multiplicative environment. It is quite likely that exposing test subjects to some other dynamic over the time of an experiment — a few hours, perhaps — won’t affect their behavior, they’ll just keep using the heuristics for evaluating risky propositions (gambles) that they’ve developed over their lifetimes, and their ancestors have developed over billions of years of evolution.

Incidentally, this suggests a prediction: a model of logarithmic utility scored under both multiplicative and additive conditions should do better than a model of linear utility also scored under both conditions.

So what would one make of a negative finding, i.e. no switching of utility functions? It would look like a falsification of EE, but that’s probably not the right interpretation. It might just mean that risk preferences developed over a long time will not change quickly enough to be visible in the experiment. It could also mean that people just don’t care about the experiment they’re briefly participating in. Their behavior may be dominated by concerns that aren’t controlled for — real wealth, not “gamble wealth” and the risks that subjects are exposed to in the real world. It could also mean that the experiment focuses on the wrong region of parameter space, for instance in gambles where wealth only changes by small factors (a few percent, say) the difference between additive and multiplicative effects is not important. This would be like the failure of CERN’s Large Electron-Positron Collider in the 1990s to find the Higgs particle below 114GeV, or like looking for relativistic effects at speeds much slower than the speed of light.

Still, if after an extensive search there’s no evidence of people switching utility functions as predicted by EE, then EE would be a lot less important than I think it is. I would still think that our baseline risk preferences are given by the environment we evolved in and by the dynamics we’re now exposed to. But it would indicate that those preferences are in essence hard-wired and that deviations from the baseline are mostly explained by idiosyncratic (person-specific) preferences, not by differences in circumstances.

So here is the statement I’m comfortable making about the Copenhagen experiment and similar experiments.

1. Evidence of individuals switching from the utility function predicted by EE in one setting to a different utility function predicted by EE in another setting must be seen as a falsification of EUT and a corroboration of EE. This would be direct evidence for the practical significance of the conceptual error in EUT of averaging across parallel worlds. Mathematically, EUT would only be valid under constant dynamics, where it can be mapped to EE (that’s like the correspondence principle in quantum mechanics, or the convergence of Einsteinian to Newtonian mechanics for small velocities).

2. Evidence of individuals failing to switch as predicted by EE is evidence against the importance of EE and in favor of EUT. Such evidence must be carefully evaluated because it is easier to fail to observe something that actually exists (false negative for EE), due to errors or invalid design, than it is to observe a specifically predicted behavior by chance although it does not actually exist (false positive for EE). Absence of evidence is not conclusive evidence of absence.

While we’re all waiting for the results of the experiment to be published, I have heard from a few groups who are now thinking about similar experiments, focusing on the economic aspect more than the neurology. I encourage the experimentalists among you to design relevant experiments. Pinpointing the effect we predict would pinpoint where economics needs to be revised and, conversely, where the conceptual flaw we have highlighted is not practically important.

The people who designed and carried out this experiment are Oliver Hulme, Kristoffer Madsen, David Meder, Tobias Morville, Hartwig Siebner, and Finn Rabe. Many thanks for your great work, and for telling me about this experiment. It may prove to be of tremendous importance for the way we think about human behavior and economics.

]]>

Let’s start with Russell’s proof that 1=0 implies he’s the Pope. Russell said the following.

False Proposition:

(Eq.1) 1=0

Theorem 1: I am the Pope.

Proof: Add 1 to both sides of (Eq.1): then we have 2 = 1. The set containing just me and the Pope has 2 members. But 2 = 1, so it has only 1 member; therefore, I am the Pope.

QED.

This makes our work a lot easier. We only have to prove that 1=0, using expected utility theory, and then, by Theorem 1, we know that Bertrand Russell is the Pope.

Here’s the strategy. We identify the contradiction in Bernoulli 1738 and show that it implies 1=0. The contradiction will be of the following form: “ and .” Since , this implies that latex 1=0. So we’ll go for that. We find a place where Bernoulli says , and then we’ll find another place where he says . That’s all we need to do, the rest was done by Russell.

On p. 24 Bernoulli writes the following:

At least since Laplace 1814, this has been interpreted as follows: the value of an uncertain prospect is the expected change in utility induced by it (Bernoulli converts this utility change into an equivalent certain monetary change, but we won’t do that, to keep things simple). In symbols, if is the utility function, is the value of the proposition, and the expectation operator, we have

(Eq. 2)

To really keep things simple, let’s work with a trivial gamble: our initial wealth is , we have to pay a fee , and we are guaranteed (probability 1) to receive a payout . According to (Eq.2) the value of this trivial gamble is then simply

(Eq.3)

If this number is positive we should take the gamble, if it’s negative we should stay away from it. To be specific, let’s use the logarithmic utility function proposed by Bernoulli, , and the following parameters:

Evaluate (Eq.3) with these parameters, and you’ll find that .

Later in the paper, on p.27, Bernoulli contradicts himself (referring to an equation on p.26, see this longer blog post for details).

Bernoulli accompanied this statement with a figure (original Latin version (1738), German version (1896), English version (1954) [update 2020-07-30: an earlier version of this post wrongly stated that the figure was not included in the original 1738 version. This has now been corrected.]). Written as an equation this gives us a different expression for the value , namely

(Eq.4) .

Evaluating this expression with our chosen parameters yields . Since in both cases we have evaluated the same quantity — the value of the same prospect to the same person of the same wealth , according to Bernoulli’s expected utility theory — we have shown that according to expected utility theory and . Since that implies 1=0, using Russell’s Theorem 1, we have also shown the following:

Expected utility theory proves that Bertrand Russell is the Pope.

QED.

]]>

Bernoulli’s paper was re-published in 1954 in Econometrica (if you don’t like paywalls, here’s a free pdf). This is the standard translation from Latin, and page numbers below refer to it. I have not read the original Latin paper, but the error is spelled out in words, visually in a figure, and also in an equation, so this is not something lost in translation. If you have a copy of the original, please send it to me. *[2018-02-18 addendum: a scan of the original is here.]* The paper is so fundamental to economics that more than 200 years after its publication it was fished out of the proceedings of the Papers of the Imperial Academy of Sciences in Petersburg (Vol. V, 1738, pp. 175-192), translated into English, and published in the leading journal of the field.

Expected utility theory (EUT) is a form of decision theory. Imagine someone offers you a lottery ticket. Prior to Bernoulli, it was assumed that people roughly maximize the expected change in wealth resulting from such a gamble. That is, if initial wealth is and possible changes in wealth are then you’d maximize

(Eq.1) .

Here, is the expectation operator. Some people like to denote it by but it’s same thing. If is positive, you’d buy the ticket, if it’s negative, you wouldn’t. But observations soon convinced everyone that this is not how people behave.

EUT was introduced as a refinement of this decision criterion. It says that you will choose the action that maximizes the expected change, not of your wealth but of your utility. The insight here is that people don’t evaluate gambles in isolation but with respect to a reference level (their initial wealth). An extra dollar is worth less to me if I’m rich than if I’m poor. Mathematically, a so-called utility function is introduced, where represents wealth.

The most commonly used utility function is . This is motivated by assuming that the extra utility someone attaches to an extra dollar is inversely proportional to the wealth that that someone already has, p.25:

Later on Bernoulli writes this assumption as the differential equation , whose solution is the logarithm. Let’s write down what EUT wants us to maximize:

(Eq.2) .

By re-writing this object you will immediately see that it actually means something else, and that’s the essence of our conceptual critique. Logarithms turn division into subtraction, . So (Eq.2) can be re-written as

(Eq.3) .

Now divide both sides by

(Eq.4) .

What’s this? In any field other than economics, this object is called the expected exponential growth rate of wealth. No utility required, no psychology required. Digging just a little bit deeper, the whole story comes to light. Wealth dynamics are fundamentally multiplicative — we can invest wealth to generate more wealth. For such dynamics the exponential growth rate is an ergodic observable, which means its expectation value tells us what happens in the long run. Mystery solved! People just optimize what happens to their wealth over time. Crucially, additive wealth changes, are not ergodic, wherefore the expectation value in (Eq.1) does NOT tell us what happens over time.

Economics textbooks and papers miss this point. The problem is treated in an a-temporal space, as a so-called “one-shot game”. The mathematics is the mathematics of things happening in parallel universes, not of things happening over time. Where time is mentioned, it is usually wrongly assumed that indicates what happens over time. Endless arguments ensue over whether is the correct psychological re-weighting people apply to wealth — in reality this is a question about dynamics, with psychology as a second-order effect — of course we can be deceived, confused, or stupid and not act in our best interest. It’s also far from easy to put a number on wealth , or indeed on the different and their probabilities involved in real-world decisions.

The history of economics is a wonderful example of how a basic conceptual error prevents the detection of technical errors and inconsistencies. Without the right concepts it’s just not possible to ask the right questions, let alone find consistent answers.

Let’s put our conceptual troubles aside for a moment and use EUT to address precisely one of the questions Bernoulli asked. My initial wealth is , I have to pay a ticket fee , and I can win prizes with probabilities , where is a random integer denoting which of the possible prizes I win. We imagine the lottery to take a time that doesn’t depend on which prize is won.

EUT says: find the value that makes the expected change in utility zero. If I pay more, the lottery will have a negative expected utility change, and EUT would advise me not to take part. Mathematically, here is the object we have to compute:

(Eq.5)

That’s it. Now vary so that the expected change in utility becomes zero. The value of where that happens is the maximum fee I should pay for the lottery.

Bernoulli makes an error when he talks about this fee. The key figure is on p.26, and I’ve scribbled in red a translation of Bernoulli’s notation into the symbols used in Peters and Gell-Mann (2016).

Let me talk you through the figure. The horizontal axis represents wealth , and the vertical axis represents utility, so that the solid curve is the utility function .

Let’s focus on the solid horizontal line first. The point B is the initial wealth . The point p, to the left of B, represents the wealth , that is, the initial wealth minus the ticket fee. The points C, D, E, F are wealth levels arrived at by adding to the initial wealth one of the possible prizes .

You may already suspect the problem: let’s imagine outcome 1 occurs in the lottery, meaning we receive the prize . In that case, our wealth will change to , and not (which is the position of point C). We have to subtract the fee! The points C, D, E, F marked by Bernoulli have no relevance to the problem (unless the fee is zero).

Now let’s walk along the utility curve. The point o marks the utility of wealth , so that the dashed line po is the drop in utility associated with a drop in wealth from to . The points G, H, L, M (to the right of B along the utility curve) represent the utilities associated with the irrelevant wealth levels . The point O represents the expected utility assuming prizes are received with their respective probabilities *but no fee is paid*, p.26

In Peters and Gell-Mann (2016) we call this object , for want of a better symbol.

Now comes Bernoulli’s error. He claims that the lottery is to be valued by comparing to the loss in utility one would suffer if one were to buy the lottery ticket for its fee but not receive any prize. The key passage is on p.26–27

This amounts to a decision criterion, different from (Eq.5) and inconsistent with EUT. We denote with the symbol what Bernoulli calls “the disutility to be suffered by losing” (Bp in his notation). Now we can write Bernoulli’s decision criterion in symbols. Bernoulli tells us to buy a ticket if the following quantity is positive, and not to buy it if it’s negative

(Eq.6) .

In principle, one could now say that Bernoulli just had a different model of people’s behavior than does modern economics. That would be bad enough because modern economics claims that it has the same model as Bernoulli. It’s more likely that Bernoulli got confused specifically when he tried to find the maximum fee. Otherwise he seems consistent with EUT, for instance on p.24, where he describes the certain profit (“the value of the risk in question”) that corresponds to an uncertain profit

Let’s really kill this. Why is (Eq.6) not a good decision criterion?

- It is easy to construct an example where the maximum fee I should pay, , is smaller than the smallest possible prize in the lottery, but the criterion still tells me to refrain from buying a ticket. That makes no sense: such a lottery has no downside risk. I’m guaranteed a positive net profit, the only uncertain element is how much better off I will be after the game. Intriguingly — have another look at the figure — the maximum fee calculated by Bernoulli
*is*smaller than the smallest possible prize, assuming that one of the prizes will be won (the distance between B and p is smaller than the distance between B and C). - Another problem is this: how much should I be willing to pay for one dollar, according to Bernoulli? What? … for one dollar? One dollar, of course! Yes, but not according to Bernoulli. Try it out: for any non-zero number of dollars I buy in dollars, due to the concavity of the utility function. Therefore, Bernoulli’s criterion tells me that no amount of money is worth that amount of money. Come on! That’s nonsense!

Later, on p.33, Bernoulli works through the specific case of the St. Petersburg lottery. This lottery is defined by and , with any positive integer. But his method is so nonsensical that he does something really curious. He wants to evaluate his criterion (Eq.6) but realizes that that’s cumbersome. He then says that if wealth is very large and the utility function can be considered linear, his criterion can be approximated by (Eq.5), meaning actual EUT. So he uses actual EUT as an approximation:

Bernoulli’s notation is different from ours ( is initial wealth (whereas we use ) and is the fee (whereas we use )). Let’s put this back into our notation and show that he really is using criterion (Eq.5) and not his own criterion (Eq.6).

We have

(Eq.7)

…take the logarithm

(Eq.8)

…simplify…

(Eq.9)

…simplify even more and subtract …

(Eq.10) .

This sets the expected change in logarithmic utility to zero, to determine the maximum fee I should pay, as required by EUT. Bernoulli does not use his own criterion (Eq.6) but the generally accepted criterion (Eq.5).

Depending on which part of Bernoulli we read, and how carefully we read it, we will come away with a different impression of what utility means and what EUT is. Not surprisingly, the economics literature is littered with arguments and disagreements and invalid studies that seem to arise from a confused use of EUT.

This is a blog post, so let me speculate about the sort of trouble Bernoulli has caused over the centuries. There are special conditions under which criteria (Eq.5) and (Eq.6) are identical. Assuming that they are identical in general (which is often — wrongly — done), one would implicitly assume such conditions. I’ll tell you about two cases with troubling consequences (there are more).

**The fee is zero .**Check this for yourself: set in (Eq.5) and (Eq.6). They really are the same then. But that’s not a very interesting case: of course I should “buy” the ticket if it doesn’t cost anything. Why not? The hidden assumption of zero fee utterly confused Karl Menger, and he concluded wrongly that utility functions have to be bounded to treat St. Petersburg-like lotteries. Samuelson (1977) was so convinced by Menger’s incorrect argument that he wrote*“Menger 1934 is a modern classic that stands above all criticism.”*Menger’s study (in German and behind a paywall) is here.An English translation (careful, it has a few typos) is in this book:

Menger (1967)*The role of uncertainty in economics.*English translation by W. Schoellkopf and W. G. Mellon. In: M. Shubik (ed)*Essays in mathematical economics in honor of Oskar Morgenstern*, Princeton University Press, Chap. 16, pp 211–231).I have discussed the problem here.

**The utility function is linear.**Again, check for yourself and set in (Eq.5) and (Eq.6). Again, they are the same. Especially in the context of prospect theory one often finds statements criticizing EUT that are puzzling. Like this one from the 2002 Nobel Prize Ceremony Speech:*“A key element in prospect theory is that individuals compare uncertain outcomes with a reference level which depends on the decision situation, instead of evaluating the outcome according to an absolute scale.”*EUT already includes such a reference level: initial wealth. Only under linear utility does the reference level cancel out. A researcher who assumes that Bernoulli and EUT are identical is likely to assume implicitly that utility is linear, in which case the reference level cancels out and would have to be re-introduced. Kahneman refers directly to Bernoulli in his Nobel lecture, aware that something is not working in Bernoulli’s theory, but apparently unaware that Bernoulli’s theory is not the same as EUT.

Finally: a plea for treating the problem with the modern mathematical concepts we now have. By this, I mean: start worrying about time, and compute time-average growth rates. The “expected change of logarithmic utility” is nothing but the time-average growth rate of wealth, under multiplicative dynamics. You can learn more about this in our lecture notes. Once this concept has sunk in — that gambles are evaluated according to the growth rates they generate for those who engage in them — the type of confusion that surrounds EUT becomes almost impossible.

EUT is the foundation of modern economics. Despite this, I have yet to find a practitioner who uses it. Of course I may be exposed to an unusual sample, but in my experience investors, bankers, risk managers, gamblers — no one uses EUT. Shouldn’t that give us pause? Economics is devoted to the quantitative evaluation of risky prospects, but the people who quantitatively evaluate risky prospects for a living make no use of the techniques it has developed.

The fundamental and fatal flaw is conceptual — parallel universes are used where there should be time and dynamics. Because of this flaw, there’s nothing to check against, the theory is not falsifiable because it depends on unobservable states of happiness or discomfort. Technical errors and inconsistencies can be argued away. It’s what Pauli called “not even wrong,” and the result of this murkiness is the coexistence of mutually exclusive, contradictory theories. Nothing is wrong and nothing is right. Different “schools of thought” have emerged. Let’s acknowledge that and ask what it means. This happens in science but it’s always a sign that a deep flaw has to be corrected, that the appropriate language has not been found yet.

We believe we now know that appropriate language. It’s the language of time and dynamics.

]]>

Following the vertiginous developments in early 20th century physics, a handful of German words established themselves in the scientific lingua franca — ansatz and gedankenexperiment are two of them. Can we add scheinproblem, please?

Planck’s lecture is available here (in German). It is an attempt to classify non-problems, so that we may spot them instead of wasting our days and nights trying in vain to resolve unresolvable conundrums. With this, Planck gives us a valuable gift: time. Remember this is an 88-year-old speaking, and looking back the question occurs to him: what are we to do with our time among the living? Instead of addressing that unanswerable question, Planck turns it around and tells us what we shouldn’t do: give ourselves a headache over unanswerable questions — scheinproblems, as he calls them.

Not all hard problems are worthy of attention, Planck tells us. Proper unsolvability does not mean that we’re too dumb — it means that the problem is not what it pretends to be. Impossible to solve is not an extrapolation of hard to solve. There’s no hierarchy: easy, medium, hard, impossible. No. Impossible is different. Just as infinity is different from “a very big number.”

Here is an example from his lecture, for which he apologizes in advance because it may offend the audience’s intelligence. He says from where you’re sitting, this wall over here is the left wall of the lecture hall. But from where I’m standing, the left wall is over there. But so which is it? I present to you the unsolvable problem: which is the left wall of the lecture hall? Planck imagined disagreements and sleepless nights, but I’m imagining departments, institutes, and professorships of lecture hall chirality, students of the problem, journals, academic conferences — the whole shebang.

It’s amusing to just go wild with Planck’s lecture hall idea. But he gives us more than this nugget. What are these problems? Can we classify them? Or, more modestly: how can we spot them? In one type, he tells us, the correct answer is a matter of perspective. An electron, in a meaningful sense, is a particle from one point of view. From another point of view it really is a wave. This is similar to the lecture hall chirality problem, and an excuse to remember the enjoyable inter-generational disagreement whereby G.P. Thompson received the 1938 Nobel Prize in Physics for discovering the electron as a wave, whereas his father J. J. Thompson had received the 1906 Nobel Prize in Physics for discovering the electron as a particle.

Incidentally, some 24 years before Planck’s colloquium, Wittgenstein concluded that all philosophical problems are really scheinproblems, and fundamentally linguistic. Abstract language, divorced from real life, confuses our thinking. We end up pondering grammatically correct but meaningless sequences of words like “why can’t unicorns fly?” Wittgenstein doesn’t use the word “Scheinproblem” but describes the phenomenon accurately, in his Tractatus Logico-Philosophicus, 6.53. To Wittgenstein, natural science — as distinct from philosophy — is the set of solvable problems, meaning not scheinproblems. Planck asks in his colloquium how scheinproblems emerge as the scientist veers off course.

Scheinproblems require a questioning of the question. Apart from looking for multiple valid perspectives, we can look for wrong assumptions hidden in the statement of the problem. What’s the inside surface of the Möbius strip in the feature image of this post? The wrong assumption is that such a surface exists. Problems arising from wrong assumptions are not resolved by normal science, they can only be resolved by a paradigm shift. We don’t solve the scheinproblem by addressing the question as posed — we solve it by re-examining the question itself, and by rephrasing it, recognizing it for what it is and possibly posing a related real problem instead. Then we solve that. Think of gamble evaluation. Because changes in expected monetary wealth fail to predict human behavior, researchers in the 18th century asked what is wrong with money (a scheinproblem), or even what is wrong with human behavior (another scheinproblem). Much effort went into answering those questions. A better question, in my opinion, is to ask what is wrong with expectation values. I don’t answer either of the questions as posed, I replace them with a question I find more promising and answer it.

The scheinproblem is not so much solved as eliminated. I cannot overemphasize the difference between the type of solution required by a problem and the type of solution required by a scheinproblem. We don’t expect a stated problem to be resolved by identifying it as a scheinproblem. We expect the problem to be real, and for someone to give us an answer. Gentlemen, my cabinet advisors tell me the lecture hall chirality issue has been defeating the brightest heads of our times. Progress must be made, funding will be available. We want answers!

Another type of scheinproblem is an under-specified problem. There may not be an error in the assumptions behind the problem statement, but we may just not have enough information to solve the problem. In that case, if we fail to recognize the schein-nature of the problem, investigators will run off in different directions and introduce different additional assumptions. We may not even notice it when we’re doing this, and when two different camps meet further down the line, they will discover irreconcilable differences. Different schools of thought emerge, each born of its own additional tacit assumptions. They may even accuse each other — correctly — of introducing additional assumptions, recognizing the fault of the other, but not their own.

Encouragingly, Planck points out that a scientific scheinproblem may turn into a real problem, meaning a solvable problem, following scientific progress. His example is the alchemists’ search for a recipe to transform mercury into gold. This was an impossibility in the context of the tools available to the alchemists. It’s not a problem of chemistry but a problem of nuclear physics, which had not been discovered at the time the problem was stated, and the alchemists made no progress. But nuclear fission had been discovered 8 tumultuous years before the 1946 colloquium (by Planck’s student Lise Meitner, and their friend Otto Hahn): nuclear physics had progressed to understand that knocking a proton out of a mercury nucleus would indeed turn it into gold.

Let me spell out the analogy to economics: we laid the conceptual foundations in the 17th and 18th centuries. Probability theory starts in 1654 with a gambling problem, life annuities are priced in 1693, Daniel Bernoulli introduces expected utility theory in 1738. The mathematical tools available during this time are not so different from the cooking pots of the alchemists. Crucially, the concept of ergodicity and the importance of time-averaging were unknown. These tools became available in the 19th and 20th centuries. And examples abound: the exponential function is all-important in the context of growth, re-investment, reproduction, and evolution. It was properly written down for the first time by Euler in 1748 (see Eli Maor’s delightful book “e The story of a number“, p.156).

Why didn’t the alchemists immediately turn into chemists and nuclear physicists? Why did their goal not guide them to a solution? My answer to this question is a quote from the 1979 Tarkovsky movie Stalker (which featured two years ago in LML’s Science on Screen program):

With their eyes glued to the prize of making gold, the alchemists failed to listen to the only certain guide in the zone of science: childlike curiosity about nature’s true structure. Identifying a goal, a useful but as yet unknown result, can create a scheinproblem. The result may be wonderful if found, but actually unattainable, or to be found in a wholly unexpected direction.

While Planck’s colloquium was about natural science, let’s end on a philosophical note and return to Wittgenstein. Philosophical problems are scheinproblems, to be solved by identifying them as such. Oddly, this does not detract from their significance. As a problem’s significance increases, how much we can meaningfully say about it often decreases, and at the point of existential significance there is nothing left to say.

“The solution of the problem of life is seen in the vanishing of this problem.

(Is not this the reason why men to whom after long doubting the sense of life became clear, could not then say wherein this sense consisted?)”

— L. Wittgenstein, Tractatus Logico-Philosophicus 6.521.

]]>

We started thinking about wealth dynamics some time in 2010 or 2011. We had been studying ensembles of growth processes, and that naturally led to thinking about ensembles of people and their growing wealths. Here’s what we did (“we” being Yonatan Berman, Alex Adamou, and I): we started with a ridiculously simple model for personal wealth, namely geometric Brownian motion (GBM).

(1)

I like to call this the equation of life. Why? Because life can be (and has been) defined as the thing that self-reproduces, and that’s what the equation describes. A quantity that produces more of itself in a noisy way. It describes what happens to the biomass of an embryo in its early stages of development, or to the population of some species growing in a rich environment.

Once we’ve got self-reproduction in an environment with some fluctuations, evolution gets going, and beautiful structures like the ones we see around us follow sooner or later.Equation (1) doesn’t just model biomass or populations but is also quite good at describing stock price dynamics. So we thought that it may be good at describing personal wealth too. After all, in one way or another both the stock market and our monetary fortunes reflect something that is happening in the economy. Let’s actually name the thing: we’re talking about capitalism. The genius of capitalism is precisely its multiplicative nature. Unused resources — capital — can be deployed to produce more of themselves. In this way a capitalist economy resembles the basic dynamic of evolution.

Our model does resemble wealth in a capitalist structure, but we were aware of its simplifying assumptions. It pretends that any changes in wealth are proportional to current wealth, whereas I could be poor and nonetheless boost my wealth through earned income. We treat everyone the same and pretend that differences in skill or earnings potential are random and not persistent etc. Nonetheless, we were curious about what would happen in a world where people’s wealth simply followed GBM.

The first observation is this: under GBM the distribution of wealth never stabilizes, not even relative wealth stabilizes (that’s personal wealth divided by total population wealth). If we wait for long enough, essentially one person ends up with all the wealth. That struck us as unrealistic: we don’t live under feudalism. But we used to live under feudalism, so the real dynamic must be less extreme than GBM. That makes some sense — after all, the government collects taxes, and there are institutions that fund all sorts of social programs. We decided to make the model a little more realistic and included re-allocation of wealth. Surely the poor are helped by the rich in some way. So we changed the equation to

(2) .

The new terms say this: every year everyone in the economy contributes a proportion of his wealth to a central pot, and then the pot is split evenly across the population ( is per-capita wealth). Again, this is very simplistic — represents a lot of different effects: collective investment in infrastructure, education, social programs, taxation, rents paid, private profits made… The equation can be re-written, which is very neat.

(3) .

This shows that it’s just like GBM (the first term) plus a mean-reversion process that attracts wealth to the population average. If I’m richer than the average, I’m likely to become a little poorer (relative to the average — my wealth can still grow); if I’m poorer I’m likely to become a little richer. The strength of the reversion is , which can be thought of as a social cohesion parameter.

This equation is great! Whereas GBM leads to a diverging (unstable) log-normal distribution of relative wealth, equation (3) leads to a stationary inverse-gamma distribution. I mean if you let the equation run for a while, the number of people with a given wealth will follow an inverse gamma distribution. That distribution has a power-law tail, similar to what has been observed many times since Pareto‘s first studies. So it’s already pretty good, on a coarse-grained level.

What else did we know? Under GBM, wealth cannot become negative. Since the poor are always better off under equation (2), this is also true here.

Thanks to tremendous efforts by many authors, including Tony Atkinson, Thomas Piketty, Emmanuel Saez, Gabriel Zucman, Wojciech Kopcuk, Jesse Bricker, Alice Henriques, Jacob Krimmel, and John Sabelhaus, we have a fairly good idea of the US wealth distribution over the past 100 years. So we took those observed distributions, created 100,000,000 individuals on a computer, fixed directly from the wealth data and set roughly to the values observed in the stock market, and let the computer tune each year so as to reproduce the real distributions.

Just for fun, we then looked at the individual wealths that had been produced by this procedure, and we noticed something strange. Many of them were negative. So back to the code, what did we do wrong? An error in the discretization scheme? Some other bug? No, the effect was real.

Here’s what happened: in order to reproduce the data, towards the end of the analyzed period the algorithm had to make negative, see figure 2 below. But what happens under those conditions to equation (3)?

Well, it describes negative re-allocation. Everyone pays the same dollar amount into a central pot, and then everyone receives from the pot an amount in proportion to how much he already has. That means if I have nothing, then I receive nothing but I still have to pay. That can make my wealth negative.Look at equation (3) again, imagining to be negative. The second term now describes mean repulsion. Whereas before wealth was attracted to the population mean, which generates a middle class, now wealth is repelled from it. If I’m a bit richer than the average, I’ll be boosted up even further; if I’m a little poorer, I’ll be pushed down even further. Run this equation for a little while and a large class of negative-wealth individuals arises.

At is turns out, something like that exists in reality. The cumulative wealth of the poorer half of the American population is roughly zero, meaning there must be a large class of negative-wealth individuals.

This is a blog post, so let me be speculative and push the story a little further than in the paper. How do those who have less than nothing keep giving to the rich? Simple: they go deeper into debt, deeper into negative wealth. But how can that be sustained over a long time? Debts don’t need to be paid off, but they do need to be serviced. To service growing debt with stagnant income (the situation in the US roughly since 1980), we need to lower interest rates.

Interest rates have been falling since about 1980, see Figure 4, precisely the time when the re-allocation rate became negative (c.f. Figure 1). What if there’s a causal link?

Now it gets interesting: interest rates have hit zero. What do we do? How can the poor keep paying the rich? Sure, let’s have some quantitative easing, but can that go on forever? Or will it break at some point? Is redistribution from poor to rich a threat to our monetary system? Is it a threat to our democracy? Where does the system go from here?Let’s be clear about what we’ve done. We built a simple model and fitted its one main parameter. This wraps everything that’s actually happening into this one parameter. There are loose ends — the model may be fooling us, but we’re certainly not in a regime where we can comfortably rely on stabilization. We don’t claim that the world really works like equation 2, but that’s not the point of the exercise. Instead we say “pretend that equation 2 describes the dynamics of wealth; what parameter values would then best resemble what really happens?” The model is no more than a model and as such brushes over many details. For example, we don’t explicitly treat inheritance or income tax or some specific welfare program. Rather, this is all treated implicitly: our summarizes everything that affects the wealth distribution beyond the null model of GBM. It reflects the overall trend in the complete economic system.

That the model produced behavior beyond our (initial) imagination is encouraging. It means we didn’t accidentally constrain our study to confirm our beliefs. We wanted to know by how much we need to slow down the increase in wealth inequality implied by GBM to get to a realistic model. The model said: no, you’re asking the wrong question. GBM actually understates the increase in wealth inequality, and you need to correct the other way. Under GBM relative wealth is non-ergodic. The ergodic hypothesis as it is made in studies of wealth inequality thus excludes GBM as too extreme. Now it turns out that real wealth dynamics are better described by correcting GBM to make it even more strongly non-ergodic. None of us had expected that.

We should have written down our guesses for before we started the study. We didn’t do this, but we certainly thought we would find a positive value. In a private correspondence, from the time before we looked at the data, Jean-Philippe Bouchaud set p.a. in an example calculation, and we all felt that was the right order of magnitude. It could be 2% but obviously not as small as 0% (which would be GBM, equation 1).

The connection to interest rates is speculative, but here’s one rock solid message about time scales that may hint at how we got here. A change in the effective re-allocation rate, , takes decades to feed through. These processes operate on time scales of generations, not election cycles. That means it’s easy to oversteer because the consequences of policy changes only become visible after 30 or 50 years, long after whoever made the policy changes has left office, and at a time when the reasons for making the changes may no longer be valid. We certainly mustn’t assume rapid equilibration. However, rapid equilibration — the ergodic hypothesis — is a standard assumption in studies of wealth distributions.

The basic dynamic of a multiplicative-wealth economy — capitalism — seems underappreciated to me. If we “do nothing” (), inequality increases indefinitely. If we re-distribute fast enough (), inequality will stabilize at some level. If we actively destabilize () as we seem to have done in recent decades, the middle class vanishes and we create a division between rich and poor — a poor person behaving reasonably is as unlikely to become middle class as a rich person behaving reasonably.

——

p.s. we can make the model arbitrarily complex. One aspect we later singled out is the effect of earnings, by including observed earnings in equation (2). Usually earnings have a stabilizing effect (meaning the process that describes only wealth must be less stable when earnings are treated explicitly). In the last 10 years or so, that stabilizing effect has been absent because of earnings inequality. Consequently, the values we find for with this version of the model are smaller (more negative) up until about 2000 and then unchanged, see figure 5 below.

]]>In 1738 Daniel Bernoulli wrote his famous paper that introduces expected utility theory and thereby defines the basis of neoclassical economics — macro and micro. Since you ask: this paper is famous for its treatment of the St. Petersburg paradox. The “paradox” goes like this:

- assume it is rational to evaluate gambles based on the expectation value of net cash gain.
- construct a specific example of such a gamble (with a large expected net gain).
- observe that no sensible person wants to take that gamble.
- conclude that this is paradoxical.

“Paradox” is a strong word here, unless it’s taken very literally — from + , meaning something that goes against prevailing belief (or dare I say against expectation). It’s just an inconsistency: the theory says one thing, real people do something else. Popper would have simply called it a falsification of the theory. In any case, it forces us to stop and think. Daniel Bernoulli stopped, thought, and decided he didn’t quite know why this was happening, but he could set up the mathematics so that it would resemble what people do. That bit of mathematics became known as expected utility theory, and it goes like this: apparently, people don’t evaluate gambles based on the expectation value of net cash gain (1. above is wrong). But we could replace cash with a non-linear function of cash. This is called the “utility” function because it looks a bit like it specifies how useful people seem to find cash.

Then we compute the expected net gain of utility, and we choose our utility function so that its expectation value decreases in the proposed gamble. That means people behave as if they’re evaluating gambles based on the expectation value of net utility change.

You can shoot holes into the philosophical basis of this treatment, and indeed I encourage anyone to do this. Why can we just introduce some non-linear function? What does that function mean? Is this framework purely descriptive or does it have any predictive power? What’s the purpose of this mathematical model? Are we just setting up an equation to reflect mathematically what we’ve observed? Isn’t that just like expressing our observation in French or Italian, i.e. in a different language but without adding any insight?

Ergodicity economics treats the problem differently and replaces the expectation value of net cash gain with the time-average growth rate of cash, and people are found to behave as this treatment suggests. So from our perspective it seems that Bernoulli’s 1738 solution is deficient because when he wrote it down the concept of time averages hadn’t been invented yet, and people hadn’t realized that expectation values don’t generally reflect what happens over time, and that an individual has no reason to maximize expected net cash gains because those are averages over many identically prepared systems, whereas an individual is only one system.

Ok — phew — had to get that off my chest. So what about doing a Laplace?

Daniel Bernoulli did not actually compute the expected net change in utility. Wow. I’ll say that again: in the seminal 1738 paper that defines neoclassical economics there’s a mistake. Not just the conceptual mistake I just mentioned (that expected cash changes are irrelevant to the decision maker), no: also in the mathematics.

Daniel Bernoulli did not actually compute the expected net change in utility.

If you’re interested in the details, we’ve written about it (Section IV B, p.6).

Now what does Laplace have to do with this? The answer is: he wrote a textbook, the second edition is from 1814, and it became a classic. In that classic, on p.440, he re-tells Bernoulli’s treatment of the St. Petersburg paradox. But Laplace was too polite to mention that Bernoulli didn’t compute the expected net utility change. No one knows why Bernoulli didn’t do it. Maybe he made a mistake, maybe he found his idea so far-fetched that he didn’t think it mattered much and that no one would really pay attention. In any case, Laplace created the myth (apparently harmless at the time) that Bernoulli had computed the expected net change in utility. Others copied the story from Laplace — for example, Todhunter wrote another textbook in 1865, and on p.220 he tells precisely the same story, using even the same notation as Laplace. This is all understandable. Bernoulli wrote in Latin, Laplace in French (much more accessible). Incidentally, Todhunter’s book was another classic — Ken Arrow told me he had read it as a young student.

This was all well and good until Daniel Bernoulli’s paper fell into the hands of Karl Menger (not Carl, but his son — Karl [Carl had taught economics to the Austrian Crown Prince, Archduke Rudolf von Habsburg, who later committed suicide]). Karl Menger read Bernoulli very carefully, in the original, and because of the inconsistency with Laplace he got terribly confused and in his confusion concluded that only bounded utility functions are permissible. By now the story was so convoluted that no one could quite figure out what was what. Karl Menger’s study was published in 1934 in the Journal of Economics, and to this day it has not been possible to correct it (I submitted a correction to the journal after untangling all of this, but it was rejected on the grounds that, while correct, it was not considered relevant to economics).

For German-speakers: when we were working on “Evaluating gambles…” Murray Gell-Mann, asked me if a Menger is someone who sells sets.

So when someone at LML says “careful not to Laplace this one” we mean to create a terrible mess just because you’re too polite to say that something isn’t quite right. Maybe we should call it “gentlemanning it” — poor Laplace, he really doesn’t deserve this. We’ve all done it. We avoid conflict but leave the source of the conflict intact, and it will just sit there and fester and eventually break out and wreak havoc.

In Laplace’s case, more than a century after the offending politeness, several economics Nobel Laureates quoted and endorsed Menger. The most impressive quote is from Paul Samuelson [p.32]: “*Menger 1934 is a modern classic that stands above all criticism.” *With that, any hope of scientific critique was extinguished, and the modern classic remains part of the modern canon.

Unfortunately, this meant a lot of hard work was misdirected, and a lot of good work was dismissed (Kelly’s 1956 paper was dismissed on these grounds).

How do we avoid doing a Laplace? What’s the opposite? The opposite of Laplacing an issue is to address it head-on. Often that feels uncomfortable, it means nailing our colors to the mast and articulating a position. It means writing clearly what we actually believe, even if some people may not like it. It’s uncomfortable for social reasons — we don’t want to offend. It’s also uncomfortable because once we’ve said something clearly, we can be found to be wrong. (The important anecdote is that of Wolfgang Pauli walking out of someone’s seminar, shaking his head, and audibly muttering “it isn’t even wrong.” It’s not good to be wrong, but it’s even worse to be unclear — for Laplacian and many other reasons.)

Here’s a situation that’s surprisingly common, and it can indicate a danger of doing a Laplace. Say I’m discussing a draft of some paper with a colleague. The colleague asks: what does this paragraph mean, and I say “oh, yes, what I wanted to say there was X.” I try to catch myself when I do this, and then re-write things explicitly. Insert whatever “X” was, namely the thing I wanted to say. Chances are I didn’t say it clearly because I felt the discomfort of taking a clear position.

]]>