Benford's Law

iBrian

Peace, Love and Unity
Veteran Member
Messages
6,721
Reaction score
218
Points
63
Location
Scotland
A mathematical curiosity here:
Numbers follow a surprising law of digits, and scientists can't explain why

Does your house address start with a 1? According to a strange mathematical law, about 1/3 of house numbers have 1 as their first digit. The same holds true for many other areas that have almost nothing in common: the Dow Jones index history, size of files stored on a PC, the length of the world’s rivers, the numbers in newspapers’ front page headlines, and many more.

The law is called Benford’s law after its (second) founder, Frank Benford, who discovered it in 1935 as a physicist at General Electric. The law tells how often each number (from 1 to 9) appears as the first significant digit in a very diverse range of data sets.

Besides the number 1 consistently appearing about 1/3 of the time, number 2 appears with a frequency of 17.6%, number 3 at 12.5%, on down to number 9 at 4.6%. In mathematical terms, this logarithmic law is written as F(d) = log[1 + (1/d)], where F is the frequency and d is the digit in question.

If this sounds kind of strange, scientists Jesús Torres, Sonsoles Fernández, Antonio Gamero, and Antonio Sola from the Universidad de Cordoba also call the feature surprising. The scientists published a letter in the European Journal of Physics called “How do numbers begin? (The first digit law),” which gives a short historical review of the law. Their paper also includes useful applications and explains that no one has been able to provide an underlying reason for the consistent frequencies.
 
I don't see this as surprising, I see it as man made. Newspaper pages, rarely will a section be longer than 20 pages...so if it is less than 10 the numbers start with 1 10% of the time, over 10 it increases by leaps with every page...Same with books...a book 10,000 pages long...not likely. So if it breaks 1,000 you'll soon have half the pages beginning with the number one.

Same with house numbers...by convention we start blocks at 100, 1,000 or 10,000 and then another street and another...don't often get to the two's and threes enough to change this statistic...this isn't a phenomena...it is taught in zoning classes...

Length of rivers...we often convert measurements to the longest possible measurement so we have the smallest possible number...we move once we get past 12 inches we go to feet, then yards then miles...or mm, cm, m, km every increasing the number of ones by lengthening the measurement unit. If we measured the rivers in meters or yards rather than miles or kilometers we'd get back to an even distribution...

Take man out of this mix (of course our numbering systems are errr...man made)

Ok I just looked at a small sampling...121 bills, US currency where there a numerous digits, but always willing to put zeros in front of the digits so everything is equal...ie 000013. ones occured disproportionately small...only 7 times! I did have an according to the log scale disproportionately large number of 5 and 6's which according the article indicated forgeries!!
 
This whole thing is driving me batty! Am I just not getting it or is this just so obvious.

if there is only one of something it starts with 1, 100%
2 of, 50% of them start with one
3 of, 33% and on we go upto
9 which still 11.11% of the them start with 1
..then suddenly
10 back to 20%
and by the time we get to 20 50% are beginning with one.

So now we wait till we hit 100 by the time we get there we are almost down to 10% and suddenly we again grow by leaps and bounds until at 199, where 55.8% of the numbers began with the number 1!!

So we begin to dwindle again for the next 800 numbers until we reach 999 and we are back at 11.11% the absolute lowest we can be with consecutive numbers...

but now we hit 1000 so we are on a roll for the next 999 that 100% of them begin with one and by the time we get to 1999 it increase our percentages incrimentally until we are back at 55.56% again...

So 55.55 is our high percentage, and 11.11 is our low, add that together you get 66.66 divided by 2 is 33.33% And the scientists are baffled why the number one occurs at the beginning of a sequence and abnormally high 1/3 of the time...

I think it is called math.

Are these scientists smarter than a fifth grader?
 
You've lost me wil.

I think that's a happy place to be on this topic. :)
 
So 55.55 is our high percentage, and 11.11 is our low, add that together you get 66.66 divided by 2 is 33.33% And the scientists are baffled why the number one occurs at the beginning of a sequence and abnormally high 1/3 of the time...

Will, you can't average percentages.
If you do the same excercise for any other number (9 for example) you will probably get the same result.
I got the frequency for each number from 1 to 9, for all numbers between 1 and 999. All have the same frequency: 111 (111x9=999)
So in theory all numbers have the same chance.

I think is more to do with your earlier explanation, and how we adjust accordingly our scales of measuring/counting, so that 1 ends up with a higher percentage.

From 1 to 9, all numbers have the same frequency=1.
From 10 to 19, 1 has the highest frequency=9
From 20 to 29, 2 has the highest frequency=9
From 30 to 39, 3 has the highest frequency=9
:
:
From 1 to 99, all numbers have the same frequency=9
And so on, the cycle repeats infinitely.
 
Will, you can't average percentages.
If you do the same excercise for any other number (9 for example) you will probably get the same result.
I got the frequency for each number from 1 to 9, for all numbers between 1 and 999. All have the same frequency: 111 (111x9=999)
So in theory all numbers have the same chance.

I think is more to do with your earlier explanation, and how we adjust accordingly our scales of measuring/counting, so that 1 ends up with a higher percentage.

From 1 to 9, all numbers have the same frequency=1.
From 10 to 19, 1 has the highest frequency=9
From 20 to 29, 2 has the highest frequency=9
From 30 to 39, 3 has the highest frequency=9
:
:
From 1 to 99, all numbers have the same frequency=9
And so on, the cycle repeats infinitely.
That only works with infinite numbers...laws of large numbers indicate it won't... not many books 900 pages long more books are in the 100-300 page range so number one starts the sequence more often... if every number sequence went through 99 or 999 or 9999...you'd be correct...but any set of numbers is going to average out with a random set of numbers whereby 1 is the beginning of the sequence most often...

Unless you are talking serial numbers whereby you include the zeros and have even digits like my dollar bill thing...then zeros occur most frequently...
 
That only works with infinite numbers...laws of large numbers indicate it won't... not many books 900 pages long more books are in the 100-300 page range so number one starts the sequence more often... if every number sequence went through 99 or 999 or 9999...you'd be correct...but any set of numbers is going to average out with a random set of numbers whereby 1 is the beginning of the sequence most often...

Unless you are talking serial numbers whereby you include the zeros and have even digits like my dollar bill thing...then zeros occur most frequently...

I agree completely.:)
 
Back
Top