Benford’s law is an observation about the leading digits in sets of numbers that span a wide enough range and which arise “naturally”. Its most well-known claim is that the first digit tends to be 1 about 30% of the time instead of the 11.1% (1 out of 9) that one would expect. Benford’s law was first observed in 1881 by the astronomer Simon Newcomb who saw that tables of logarithms were more worn-out on their early pages (where numbers start with “1”) than on their later pages, suggesting that people just didn’t need logarithms of numbers beginning with 2-9 as often. The physicist Frank Benford assembled numerous real-life examples of the phenomenon and published his findings in 1938.
That digits should be anything other than evenly distributed probably seems outlandish at first, but thinking along the right lines shows that it is not only not bizarre, but actually must be the rule. The idea to keep in mind is “scale invariance”, meaning that multiplying every number in a set by some single number (i.e. “scaling” the set up or down) should not change anything about the statistical behavior of the digits. For example, multiplying masses of astronomical objects given in kilograms by 2.204623 to get the same masses in pounds should still give the same statistical patterns in the digits. After all, pounds and kilograms are just units made up by humans to measure mass, not fundamental features of the universe or the place-value system, so the choice of units shouldn’t affect the behavior of the digits.
Because Benford’s law is so counterintuitive but followed by many real-life data sets, it is a valuable tool in forensic accounting. Fakers, by definition, want to pass off false numbers as accurate numbers but introducing false numbers often causes the digit distribution to deviate from Benford’s prediction. This deviation can be detected by clever auditors.
A Simple Example
Let’s suppose that each digit $i$ from 1 to 9 appears as the leading digit with probability $p_i$ and that the second digits are equally likely, and then double all the numbers. We can figure out the probability of each leading digit among the doubled numbers like so:
| Digit | Original Probability | Probability after doubling |
| 1 | $p_1$ | $p_5 + p_6 + p_7 + p_8 + p_9$ |
| 2 | $p_2$ | $\frac12 p_1$ |
| 3 | $p_3$ | $\frac12 p_1$ |
| 4 | $p_4$ | $\frac12 p_2$ |
| 5 | $p_5$ | $\frac12 p_2$ |
| 6 | $p_6$ | $\frac12 p_3$ |
| 7 | $p_7$ | $\frac12 p_3$ |
| 8 | $p_8$ | $\frac12 p_4$ |
| 9 | $p_9$ | $\frac12 p_4$ |
The first row means that the only ways to get a leading digit of 1 in the doubled number is for the leading digit to have been 5, 6, 7, 8, or 9 in the original number. Similarly, the fourth row says that only that half of the original numbers which have leading digit 2, namely those whose second digit is 0-4, will produce a leading digit of 4 when doubled. The other half have second digits 5-9 so will produce a number with leading digit 5 when doubled, as seen in the fifth row.
Scale-invariance means the probabilities in the second column match those in the third, so we have:
$$\displaystyle \begin{aligned} p_1 &= p_5 + p_6 + p_7 + p_8 + p_9 \\ p_2 &= p_3 = \frac12 p_1 \\ p_4 &= p_5 = p_6 = p_7 = \frac12 p_2 = \frac14 p_1 \\ p_8 &= p_9 = \frac12p_4 = \frac18 p_1 \end{aligned} $$
Since the probabilities in each column all must add to 1 we can solve for $p_1$ and then get the others, which turn out to match fairly well to the actual predictions from Benford’s law:
| Digit | Probability | Benford’s law Prediction |
| 1 | $p_1 = \frac{4}{13}\approx 0.3077$ | $0.3010$ |
| 2 | $p_2 = \frac{2}{13}\approx 0.1538$ | $0.1760$ |
| 3 | $p_3 = \frac{2}{13}\approx 0.1538$ | $0.1249$ |
| 4 | $p_4 = \frac{1}{13}\approx $0.0769 | $0.0969$ |
| 5 | $p_5 = \frac{1}{13}\approx 0.0769$ | $0.0792$ |
| 6 | $p_6 = \frac{1}{13}\approx 0.0769$ | $0.0669$ |
| 7 | $p_7 = \frac{1}{13}\approx 0.0769$ | $0.0580$ |
| 8 | $p_8 = \frac{1}{26}\approx 0.0384$ | $0.0512$ |
| 9 | $p_9 = \frac{1}{26}\approx 0.0384$ | $0.0458$ |
It turns out that a variation of Benford’s law also applies to the second digits, so we weren’t entirely justified in assuming they were uniformly distributed. But this is all just to give you a sense of why Benford’s law isn’t such an unexpected phenomenon. The real math comes next.
Derivation
We will prove that a scale-invariant distribution of leading digits must be the Benford’s law distribution. Scale-invariance means that for any multiplier $\lambda$:
$$\displaystyle P(X\text{ has leading digit } i) = P(\lambda X\text{ has leading digit } i) $$
The leading digit can be found from the fractional part (everything after the decimal point) of the base-10 logarithm of $X$, which we’ll denote $f = (\log_{10}(X) \mod 1)$. Let $L(f) : [0,1]\to \mathbb{R}$ be the probability density function of $(\log_{10}(X) \mod 1)$. Scale-invariance can then be rephrased as:
$$\displaystyle L(f) = L((f + \log_{10}(\lambda)) \mod 1) $$
For this to be true for any $\lambda$ must mean that $L(f) = 1$. Basically we’re asking which distribution remains the same when shifted left or right by any amount and the overspill that extends less than 0 or more than 1 is cut off and pasted to the other end of the shifted distribution. Think about for a while and you’ll see that only a constant, uniform distribution will work. This means that scale-invariance is equivalent to the fractional part of the logarithm being uniformly distributed.
Now we have to translate from “the fractional part of the logarithm” to the leading digit of the original number. This is not so hard:
$$\displaystyle \begin{aligned} P(\text{leading digit of }X\text{ is }i) &= P(\log_{10} i \le (\log_{10}(X) \mod 1) \le \log_{10}(i+1)) \\ &= \int_{\log_{10} i}^{\log_{10}(i+1)} L(f) \,df \\ &= \log_{10}\left( \frac{i+1}{i}\right) \end{aligned}$$
This result is where the prediction in the previous table came from. We have used base 10 because that is how people count, but the same argument holds for any base. Just change the base of the logarithm.
Wrapping Up
Far from being some mystical oddity, Benford’s law is really just a consequence of how place-value number systems work. The preceding derivation sheds light on what kinds of sets can be expected to follow it. They would have to span several orders of magnitude, so that knowing the leading digit does not tell you much about how large the number is. As an example of a set that would not exhibit this pattern, the heights of all adult humans are scattered around 2 meters. So if I told you that the height of some specific adult started with a 2, you would know that person is around 2 meters tall, not 200 meters or 0.2 meters. In contrast, if I told you that the length of some river starts with a 2, you really have no idea if it runs 200 meters, 200000 meters, or some other distance.
The second property is, as I’ve mentioned a few times already, scale-invariance. Which basically applies if the data set has some human-created unit of measurement associated with it, though there are examples of data sets that don’t have units which still obey Benford’s law. This is because Benford’s law can also be derived from the idea of base-invariance instead of scale-invariance, though that proof is considerably more complicated so I have not discussed it here.