Expected number of different birthdays

Suppose there are n people in a room. It is the well known ‘birthday problem‘ that if n > 22 the probability of two people sharing the same birthday is greater than 50%, and so it is very likely that there aren’t n unique birthdays. However, what is the expected number of unique birthdays?

[Let us assume there are D days in a year, in order to be more general.]

Actually, the expected number of unique birthdays is very easy to calculate. Let e(n) denote the expected number of unique birthdays when there are n people.

Choose one of the n people. Then either they have a different birthday than everybody else, or they don’t. They have a different birthday than the other n-1 with probability $\left(\frac{D-1}{D}\right)^{n-1}$. Let us write $\phi$ for $(\frac{D-1}{D})$, and so the probability that they have a different birthday to everybody else is $\phi^{n-1}$. In this case, the expected number of unique birthdays is 1+e(n-1). In the other case, which thus has probability  $1 - \phi^{n-1}$, the expected number is just e(n-1).

Hence,

$e(n) = \phi^{n-1}(1+e(n-1)) + (1-\phi^{n-1})e(n-1) = \phi^{n-1} + e(n-1)$

As e(1) = 1, it follows that:

$e(n) = \phi^{n-1} + \phi^{n-2} + ... + \phi + 1 = (\phi^n - 1)/(\phi-1) = D(1-\phi^n)$

(As one should expect, as n tends to infinity this tends to D, as it is almost certain that every birthday gets taken)

For instance, with D = 365, you should expect about 21 different birthdays when there are 22 people, or 46 different birthdays when there are 50 people. When there are 1000 people, there will be around 341 different birthdays. Perhaps it is quite surprising that you should expect there to be so many (24) unclaimed birthdays with so many people!

Consider the following problem:

Suppose a factory is in a country with the following law: Every worker must work every day, unless it is the birthday of one of the workers. In this case, everybody has the day off. How many people should you hire in order to maximize the expected number of man-days worked every year?

Naturally, if there is only a single person he will certainly work 364 days (D-1) a year. If there are two, they are likely to have different birthdays and so each will work 363 days a year (D-2), giving a total of 726 days.

The expected number of man days worked will be the number of people, n, multiplied by the expected number of non-birthdays. The number of non-birthdays is just D – e(n). Thus we are trying to maximise the following:

$W(n) = n(D-e(n)) = nD\phi^n$

Considering n as a real number instead of an integer, one can find that the function W has a maximum at $n = -1/\log\phi = -1/\log(1-1/D) \approx D - 0.5$. Thus the maximum must be around this, and so for all but the smallest D (where the approximation is not valid), the maximum will be either D-1 or D.

In fact, it is easily seen that W(D) = W(D-1). It follows that one should employee either D or D-1 people. Since you will probably have to pay these workers, it is better to go for the D-1 people.

A simple change of this problem is to consider that for each man-day of work in the factory, you earn £a. However, you must pay each worker £y per year. Then instead you are maximising the yearly income of:

$V(n) =anD\phi^n - ny$

The derivative of this is $V'(n) = aD\phi^n + a D n \phi^n \log \phi - y$. Hence if $V'(n) = 0$ we must have:

$y/aD = \phi^n(1+n \log \phi) = e^{n \log \phi} ( 1 + n \log\phi )$

Define $w = 1 + n\log\phi$ so that:

$y/aD = w e^{w-1}$

And so we have that:

$ey/aD = w e^{w}$

Let us note that $y/aD$ is the ratio between what you pay the worker per year, and what he earns for you per year if he were to work every day. Let us call this ratio P.

Note that when z is positive, the equation $z = w e^w$ always has a unique solution. The function which takes z to w is called the ‘Lambert W Function‘. Hence we have that:

$W(eP) = W(ey/aD) = w = 1 + n\log\phi$

It follows that:

$n = (W(eP)-1)/\log\phi$

As P will be small, and in particular less than 1, an approximation of the Lambert W function may be useful. We will use the Taylor Series approximation $W(x) \approx x-x^2+3x^3/2 - 8x^4/3$. Along with the usual log approximation, this gives:

$n \approx D-\frac{e y}{a}+\frac{e^2 y^2}{a^2 D}-\frac{3 e^3 y^3}{2 a^3 D^2}+\frac{8 e^4 y^4}{3 a^4 D^3}$

Or, in terms of P,

$n \approx \frac{1}{6} D \left(6-6 e P+6 e^2 P^2-9 e^3 P^3+16 e^4 P^4\right)$

For instance, suppose that I pay my workers £18,000 per year. They generate about £350 in profit for me each day they work. They work every day, except for when one of them has a birthday. By evaluating the W function exactly, I find out that the maximum expected profit is when there are around 260 workers. The approximation yields 269 workers. If the profit they generate per day were to be decreased to £75 then the approximation is completely inaccurate. Whilst I should only employee 72 people, the approximation gives over 7,000!