Learning Python – Proving the Birthday Paradox
There is a fantastic book that deals with the history, philosophy and genius of maths called Alex’s Adventures in Numberland by Alex Bellos. I’ve just finished reading the chapter on probability in which he discusses the birthday paradox.
This is the idea that in a group of 23 people there is just over a 50% chance of having 2 people who share the same birthday. At first glance this doesn’t seem true, but through a series of logic and probability proofs Bellos shows this is indeed the case.
While formulas are all well and good, I wanted to have a look at some raw data. Conveniently this tied in with me having just started looking into Python, and it presented itself as a good way to learn the basics of control flow and data types in the language.
The program is written quite bluntly: keep choosing random numbers between 1 – 365 and if the number has already come up then output how many numbers have been generated. Then loop this 5000 times and get the average.
The numbers (or dates) chosen are stored in an unsigned integer array (line 16). I could have just used a standard python list, however the unsigned int array is more efficient since the program knows the exact datatype of every value stored. In this particular situation it is unlikely to make a difference, but learning best practise at the start is always good.
I’ve added the code below, it is uncommented but it should be fairly obvious for anyone who has done some programming.
Of course the result is yes, after 5000 loops the number of people counted is almost always between 21 and 25 with a heavy weighting to 23. As you increase the iterations from 5000 to 10000 it narrows this down on 23, reducing it to 100 iterations will leave you with averages from 10 to 40 or wider.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
from array import * import random def main(): minDate = 1 maxDate = 365 num = 5000 i = 1 total = 0 minFound = maxDate + 1 maxFound = minDate - 1 while i < num: dates = array('I') found = False while found == False: insert = random.randint(minDate, maxDate) try: test_found = dates.index(insert) found = insert except: dates.append(insert) lendates = len(dates) total += lendates if minFound > lendates: minFound = lendates if maxFound < lendates: maxFound = lendates print "Loop " + str(i) + ": " + str(lendates) i += 1 print " AVERAGE: " + str(total / num) print " MIN: " + str(minFound) print " MAX: " + str(maxFound) if __name__ == "__main__": main() |
from array import *
import random
def main():
minDate = 1
maxDate = 365
num = 5000
i = 1
total = 0
minFound = maxDate + 1
maxFound = minDate - 1
while i < num:
dates = array('I')
found = False
while found == False:
insert = random.randint(minDate, maxDate)
try:
test_found = dates.index(insert)
found = insert
except:
dates.append(insert)
lendates = len(dates)
total += lendates
if minFound > lendates:
minFound = lendates
if maxFound < lendates:
maxFound = lendates
print "Loop " + str(i) + ": " + str(lendates)
i += 1
print " AVERAGE: " + str(total / num)
print " MIN: " + str(minFound)
print " MAX: " + str(maxFound)
if __name__ == "__main__":
main()
Some banter about “Learning Python – Proving the Birthday Paradox”
comments and stuff
Glad you like the book bruv.
Nice site.