Learning Python – Proving the Birthday Paradox

There is a fantastic book that deals with the history, philosophy and genius of maths called Alex’s Adventures in Numberland by Alex Bellos. I’ve just finished reading the chapter on probability in which he discusses the birthday paradox.

This is the idea that in a group of 23 people there is just over a 50% chance of having 2 people who share the same birthday. At first glance this doesn’t seem true, but through a series of logic and probability proofs Bellos shows this is indeed the case.

While formulas are all well and good, I wanted to have a look at some raw data. Conveniently this tied in with me having just started looking into Python, and it presented itself as a good way to learn the basics of control flow and data types in the language.

The program is written quite bluntly: keep choosing random numbers between 1 – 365 and if the number has already come up then output how many numbers have been generated. Then loop this 5000 times and get the average.

The numbers (or dates) chosen are stored in an unsigned integer array (line 16). I could have just used a standard python list, however the unsigned int array is more efficient since the program knows the exact datatype of every value stored. In this particular situation it is unlikely to make a difference, but learning best practise at the start is always good.

I’ve added the code below, it is uncommented but it should be fairly obvious for anyone who has done some programming.

Of course the result is yes, after 5000 loops the number of people counted is almost always between 21 and 25 with a heavy weighting to 23. As you increase the iterations from 5000 to 10000 it narrows this down on 23, reducing it to 100 iterations will leave you with averages from 10 to 40 or wider.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
from array import *
import random
 
def main():
    minDate = 1
    maxDate = 365
 
    num = 5000
    i = 1
 
    total = 0
    minFound = maxDate + 1
    maxFound = minDate - 1
 
    while i < num:
        dates = array('I')
        found = False
 
        while found == False:
            insert = random.randint(minDate, maxDate)
            try:
                test_found = dates.index(insert)
                found = insert
            except:
                dates.append(insert)
 
        lendates = len(dates)
 
        total += lendates
 
        if minFound > lendates:
            minFound = lendates
 
        if maxFound < lendates:
            maxFound = lendates
 
        print "Loop " + str(i) + ": " + str(lendates)
 
        i += 1
 
    print " AVERAGE: " + str(total / num)
    print " MIN: " + str(minFound)
    print " MAX: " + str(maxFound)
 
if __name__ == "__main__":
    main()
from array import *
import random

def main():
	minDate = 1
	maxDate = 365

	num = 5000
	i = 1

	total = 0
	minFound = maxDate + 1
	maxFound = minDate - 1

	while i < num:
		dates = array('I')
		found = False

		while found == False:
			insert = random.randint(minDate, maxDate)
			try:
				test_found = dates.index(insert)
				found = insert
			except:
				dates.append(insert)

		lendates = len(dates)

		total += lendates

		if minFound > lendates:
			minFound = lendates

		if maxFound < lendates:
			maxFound = lendates

		print "Loop " + str(i) + ": " + str(lendates)

		i += 1

	print " AVERAGE: " + str(total / num)
	print " MIN: " + str(minFound)
	print " MAX: " + str(maxFound)

if __name__ == "__main__":
	main()

Some banter about “Learning Python – Proving the Birthday Paradox”

comments and stuff