Ramanujan’s “On Question 330 of Prof. Sanjana”

I have written an article that attempts to explain an interesting paper of Ramanujan’s. In this paper he evaluates certain infinite series. The technique is quite pretty in my opinion, and so I hope it is enjoyable.

Download it here.

Preview of the article

Keypads with paint on the buttons

Near my home there is a door that is opened by a keypad. I know that it uses a four-digit code, where the order matters, but I don’t know what the code is.

doorlock

An example of a keypad

When I walked past it the other day I noticed that there were blue fingerprints over three of the numbers on the keypad, but I did not look very closely and therefore there may have been four. If there were only three numbers with fingerprints on them then I would infer that one of the numbers in the code was repeated.

I wondered this: If I wanted to guess the code as quickly as possible would I prefer that there were three digits or four digits with paint on them?

Intuitively, most people (at least in my informal survey) seemed to expect that it is more constrained when there are only three digits, and so there are fewer combinations. After all, if there was only one blob of paint, so all the digits were the same, then there would be only one possible combination, so you may expect that each digit adds more freedom and thus more combinations.

In fact, this is not the case, and you would be better placed to guess the code if you knew it had four different digits.

The number of combinations to try if you know there are four different digits is of course a.  If instead it is known there are only three different digits in the code then one of them is repeated. To calculate the number fo combinations, imagine choosing the two positions in the 4-digit code that are the same digit – there are 6 (i.e. 4C2) ways to do this – and then choose which digit is the one which is repeated – there are 3 to choose from. The remaining 2 digits can be arranged in 2 ways – therefore there are b ways of doing it – 50% more!

I thought this was somewhat surprising.


In general, suppose the painted key-pad buttons are c and the lock takes an M digit code. Of course, d.

I think it is easier at this point to reason if we forget about locks and talk about functions instead. A function e2 is equivalent to a possible code if it is surjective. The code it represents is f2. As f is surjective every painted digit appears at least once, as is required.

We can therefore restate the problem as counting the number of surjective functions from {1,2,…,M} to {1,2,…,N}.

It’s easy to find out the number of functions if we drop the ‘surjective’ requirement – it is NM. This number includes non-surjective functions – where the image of the function is some proper subset of {1,2,…,N} – as well. An inclusion-exclusion argument can be used to calculate the number of surjections.

In particular, define cito be the number of functions f from {1,2,…,M} to {1,2,…,N} such that at least i of elements of {1,2,…,N} are missing from the image of f (i.e. the image of f is at most N-i).Ci is then easy to calculate – choose the i elements to exclude (there are NCi of them) and then assign to each f(j) one of the possible remaining N-i values. Repeat that for all M function inputs and hence:

x1

Then, by inclusion-exclusion, the number of surjective functions is:

x2

And hence the number of surjections, and hence the number of possible lock codes of length M using N different digits, each at least once, is:

x3

This is related to Stirling Numbers of the Second Kind.


Not all keypads use four-digit codes. Looking online, most digital keypad locks can use between 1 and 12 digit codes. The following chart shows the approximate number of combinations that are possible if the length of the code is known (the number on the left-hand side) and the number of painted digits is known (the number on the top):

Exactly

(Here a million is 106 and a billion is 109)

In reality, you may not know the number of digits in the code. Here is the number of combinations that must be tried if you know an upper-bound to the number of digits in the code:

UpToNumerical


A few additional thoughts:

  • Many digital keypads also require a card to be presented and the code is associated with the particular card (i.e. is a PIN). Then you will need access to the card too.
  • Most mechanical keypads, disappointingly, don’t actually require the digits to be entered in any particular order.
  • There exist keypads that randomise the digits every time they are used. In addition, they can use special screens that can only be seen when looking straight at them.
  • You may be tempted to think that the repeated digit, if there was one, might have more paint on it. However, it actually turns out that the first button pressed deposited much more paint on it than the others. By way of an example, if you are looking for a four-digit code and there are three paint splodges, then by identifying the first digit you reduce the 36 combinations to 12 (not 6, as the first digit may have also been the repeated digit).

Deducation to do with knowledge and belief

Knowledge and belief are fairly intractable subjects. Nevertheless, we can give some properties that we believe knowledge and belief have, and deduce from that.

First, any statement about the world can be known or believed (or both, or neither). In addition, if A is something that can be known or believed, then so can the statements ‘I believe A’ and ‘I know A’. That is, it’s possible to have beliefs about your knowledge, or knowledge about your beliefs, etc.

Knowledge ought to be stronger than belief. In particular, if you know something you should also belief it. I cannot sensibly know that apples grow on trees but not believe it. If I don’t believe it, it is because I don’t know for sure.

Whilst ‘doublethink‘ is perfectly possible in reality, it is not a property of a good belief system. It is fair to assume that if you believe something, you do not believe the opposite to it.

If somebody asks you whether you know a statement is true or not, you’re aware if you don’t know, and can tell the person that. You know your ignorance. That is, if you don’t know something, you know you don’t know it.

And finally, the tightest coupling between believe and knowledge comes from the idea that if you believe something, you believe you know it. Let us give an example. I know that apples grow on trees. I also believe that it would be odd for something to both grow on trees and be an animal. Thus, I conclude in my head that I believe apples are not an animal. But I feel my believes are fairly secure, so although I don’t know apples are not animals, I believe I know it.

There are other rules one might want to add. For instance, it might be that knowledge is such a strong condition that it is impossible to know something that is false. Others are to do with you knowing what you believe, and others still to do with being able to deduce things from knowledge and believe of implications (as was done in the example to do with animals and apples). Nevertheless we do not need any more assumptions, so we stop here.

The four assumptions we have listed are enough to prove that if one believes something, then they know it. As one of the assumptions was that knowledge implies belief, together they imply that you believe something if and only if you know it. They are just two words for the same thing. It is possible to conclude different things from this, but I’d just say it implies our axioms do not model the system well. I feel it is the last statement – if you believe something, then you believe you know it. That is far too strong. There are plenty of things I believe – for instance, the truth of certain mathematical conjectures – but I do not believe I know such things. (In fact, I know I do not know them.)

Let us write ¬ for ‘not’, K(A) for ‘I know A’, and B(A) for ‘I believe A’, where A is an arbitrary thing which can be believed and known. The axioms become, in the order they were introduced above:

  1. K(A) implies B(A)
  2. B(A) implies ¬B(¬A)
  3. ¬K(A) implies K(¬K(A))
  4. B(A) implies B(K(A))

Let us also assume the usual laws of logic, e.g. that A implies B is the same as ¬B implies ¬A.

Suppose you believe a statement S. That is, B(S). Then you believe you know it, so B(K(S)) – that is rule 3. But B(K(S)) implies ¬B(¬K(S)), where it is an application of rule 2 with the parameter A = K(S). That is, follows that one does not believe that one does not know S.

The contrapositive of rule 2 is that if you don’t believe something, then you don’t know it. Therefore ¬K(¬K(S)) follows. You don’t even know that you don’t know S. But the rule of knowing your own ignorance – rule 4 – implies that this means you know S. Therefore K(S).

Hence, if you believe something, then you know it.

As mentioned above, I feel axiom 4 is the problem. Nevertheless, if one were to change ‘belief’ and ‘knowledge’ to ‘conjecture’ and ‘formal proof/strong evidence’, to model mathematical or scientific reasoning, one sees that axiom 3 is problematic too. Just because there is a lack of evidence for something, doesn’t mean there is evidence that there is a lack of evidence for something.

Tie-break Criteria and the Manchester Clubs

The 2011/12 Premier League season finished with Manchester United and Manchester City at the top of the table with a joint number of point. Different leagues have different rules on what happens in this situation.

In the past the relevant tie-breaker was ‘goal average’. This is the ratio of goals scored to goals conceded. It has not been used in the English Football League since 1975-76. This seems like an odd measure, as intuitively it is the number of goals you win by which is important, irrespective of offsets. For instance, surely 2-1 is just as good as 1-0?

This is no longer used, presumably for the reason that it makes conceding goals too costly. Goal difference is used instead. This is the difference between the number of goals scored and the number conceded.

In tournaments like La Liga, Serie A, the Champions League group stage and the European Cup group stages, it isn’t immediately used. If two or more teams are equal in points then a new league is considered which contains only these teams and from the games between these teams. This is then ordered first by points, and then by goal difference (and then it will typically continue to goals scored, and so forth). The ordering of this league then orders the joint teams in the original league.

[As an aside, I find it slightly odd that the rules are not applied recursively.]

In other tournaments, like the Bundesliga, the Premier League and the World Cup group stage, then after points goal difference across all games is compared. This is what happened to Manchester United and Manchester City, where Manchester City become the first team to win the league on goal difference. (The only time there was been a draw on points at the top of the table since goal average was abolished was 1988-89, where Arsenal and Liverpool were drawn on both points and goal difference, and Arsenal won it on goals scored)

Both teams won, lost and drew the same number of games, giving them the same score. However, Manchester City had a goal difference of 64 and Manchester United had a goal difference of 54.

Goal difference is better than goal average in that it only cares what the difference in scores is. Nevertheless, it is perhaps too linear. Is winning four games 1-0 and then winning a single game 6-0 better than consistently winning 2-0 over five games? Goal difference treats them as the same. If they aren’t equal, it isn’t particular obvious which is better. The latter team were consistently better, but the former team achieved a large victory. There seems to be an argument either way.

The following is a graph of the distribution of wins of the two Manchester teams. As it is already decided that it is only goal difference in a match which matters, this is all that is graphed. Obviously the total left of ‘0’ denotes the number of losses, and right of it the number of wins.

It should be noted that when the Manchester clubs played each other, United won 1-0 away and lost 6-1 at home. Therefore if a head-to-head system were to be used which went to goal difference, City would still win. Goal average would not help United either – they conceded more goals and scored less, so any sensible combination of these two numbers will see City win. Had the United-City game been 1-0 instead of 6-1, the goal difference delta would have changed by 8, but City still would have won on goals scored.

The graph shows that United did achieve more very large wins than City though. United won of five occasions by five or more goals. City did this ‘only’ twice. Perhaps there is some home for United after all, if we change our values?

Is winning by two goals twice as good as winning by 1? Perhaps not. Winning by one can be luck, but winning by two seems like there might have been some skill involved. Perhaps instead of adding up the games goal differences, one should add up the squares of the games goal differences. Obviously, one would have to subtract instead of add if you lost by that amount, otherwise losing by 2 goals would be just as good as winning by 2 goals!

If this is done, Manchester United get a score of 188, but Manchester City get 212. They will still win.

Okay, let’s say winning by two goals is twice as good as winning by one. But then is winning by three goals only 1.5 times better than winning by 2? Perhaps it is twice as good again. Continuing  5 goals is 16 times better than scoring 1. If you lose, it should be the same as if you win except with the opposite sign, and if it was a draw the value is zero.

If we do this, then Manchester City get 112 points, but Manchester United get 121.

Nevertheless, this method is fairly silly, as it means the game Manchester United won 8-2 at the start of the season screws it all. Perhaps the truth is the opposite: the first goal is very important, but each goal after that becomes less and less important. After all, 5-0 and 6-0 are basically the same score. The last goal doesn’t matter very much at all. Perhaps it shows more like a logarithm than a power.

Using \log{(1+|d|)} \times \rm{sign}(d) gives Manchester United 38.1 points, but Manchester City 32.3.

In summary, there is no easy answer to a tie-breaking criterium that rewards the things you want it to. At least goal difference is simple, and so it seems a good choice.

A Bit of Graph Theory

I’ve written a document which explains some basic graph theory. In particular, it poses a few puzzle questions related to planar graphs and then solves them using Euler’s Formula.

I tried to write it in a casual style. The idea was that it should be easy to read whilst not giving up any mathematical accuracy. Actually, I’m not sure the style works at all, but it was worth experimenting.

Thank you to Kris for some suggestions to improve it. A version suitable for A5 booklet printing is available on my website.

graph_preview

Click to view/download

Self-Referential Number Square #1 (Updated)

I’ve updated the first self-referential number square so that there isn’t any need for any additional clues at the bottom. It now looks as follows:

Suggested solution

A set of numbers such that when you remove one number, the rest can be arranged into two equal groups with the same sum

Suppose you have a set of 17 numbers with the property that if you remove any one of the numbers, the remaining can be split into two groups of 8 which sum to the same total. We will show that the numbers must all be equal.

If we have a set of 17 numbers with this property, then we can create another set of 17 numbers by subtracting a constant. So if a_1 \cdots a_{17} are one group of numbers with the property, then so is b_i = a_i - K.

Similarly, if a_i have the property, then so do c_i = K a_i, where K is a constant.

In particular, this means that we can shift it so that one of the numbers is zero.

To begin, let us assume the numbers in question are all integers. Let us take out the number which is zero. Therefore we get two groups which both sum to the same (integral) value, call it S. Therefore the sum of all the numbers T is simply 2S + 0. Therefore T is even.

Let us now take out another one of the numbers, call it k. So we get two groups of numbers again which sum to the same value, call it S’. So T = 2S’ + k. This implies that k is even, as is.

Therefore each of the numbers is even. But if that is the case, we can divide all the numbers by two to get another set of integers with the property. This process can be continued indefinitely, which is impossible unless all the numbers were zero to start with. Thus we conclude all the numbers were equal to zero. That means in the original case, before a constant was subtracted, they must have all been equal. (c.f. Proof by Infinite descent)

The case for rational numbers is simple, as you may just multiply all the rational numbers be a constant in order to make them all integers (for instance, by multiplying by the product of the denominators). Thus they are all equal.

The case for real numbers requires a trick. Denote the 17 numbers by a_i. These numbers generate a finite dimensional vector space over the rational numbers (of dimension at most 17, call it N). Therefore we can choose a basis (without the Axiom of Choice). Call this basis r_j, 1 \le j \le N \le 17.

Therefore each of the a_i can be written uniquely as a sum of the r_j multiplied by a rational number. Write [a_i]_j for the coefficient of r_j in this representation.

Fix j. Then the set of 17 numbers [a_i]_j, 1 \le i \le 17 form a set of rational numbers with the property that taking out any one of them allows the remaining to be rearranged into two groups with the same sum. This is because the set of 17 real numbers have this property, and whatever the two sums are, the r_j coefficients must be equal by the uniqueness of basis representations.

However, this implies by the above that the value of [a_i]_j is constant for different values of i.

Therefore all the coordinates are individually constant, and so the 17 numbers are all equal.

This problem is from Problems and Theorems in Classical Set Theory. Obviously, no use is actually made of the fact there are 17 numbers. In fact, no use is made of the fact that the two groups of numbers are equal sizes.

Was Wronski Wrong?

I was reminded of a problem from a text book from school. The book was MEI Pure Mathematics 5, First Edition. It had the following question:

The Polish mathematician Hoëné Wronski (1778-1853) once wrote that:

\pi = \frac{2\infty}{\sqrt{-1}}\left\{ (1+\sqrt{-1})^{1/\infty} - (1-\sqrt{-1})^{1/\infty}\right\}

Was Wronski wrong?

At the time I found this problem very hard and I was very proud to have solved it. Now I am older, I don’t understand how I could have ever found it difficult, but I suppose that is a good thing.

Let us first turn it into modern notation. We wish to find q, where:

q = \lim_{x \rightarrow \infty} \frac{2x}{i}\left\{ (1+i)^{1/x} - (1-i)^{1/x}\right\}

Note that 1+i = \sqrt{2} e^{i \pi/4} and 1-i = \sqrt{2} e^{-i \pi/4}. Using these representations,

q = \lim_{x \rightarrow \infty} -2^{1+\frac{1}{2x}} xi \left\{ e^{i\pi/4x} - e^{-i\pi/4x}\right\}

And so

q = \lim_{x \rightarrow \infty} -2ix2^{1/2x} (2i \sin (\pi/4x))

Now, if we use the fact that \sin {x} \approx x for small x, we have:

q = \lim_{x \rightarrow \infty} 2^{\frac{1}{2x}} \pi

and hence q =\pi as required.

So, yes, Wronski was right.

Arithmetic: then and now

The Conservatives recruited Carol Vorderman to head a ‘mathematics task force’ before the election, in 2009. After the election, in August 2011, the ‘task force’ reported (Telegraph article / Original Report). I find it a little bit puzzling that Vorderman was selected for this, as she is neither a teacher nor a mathematician. (She only managed a 3rd in each of her years at Cambridge studying Engineering.) Nevertheless, the bulk of the work was done by Roger Porkess, it appears, who is head of MEI. He is good.

The report has some good points in it. Nevertheless, page 52 is interesting. It shows three exam questions from previous papers, and the following remark:

Our conclusion from looking at the papers from these three years is that O level was harder than what came before or after it. However, a comparison is not entirely fair because GCSE is designed for a very wide range of students: O levels were not. Discussion of the standard of this, or any other, qualification needs to take place in the context of its fitness for purpose. However, there is compelling evidence that grade inflation has occurred.

The questions are as follows:

(In what follows I will answer the questions. This is probably fairly pointless, as if you read this blog you can almost definitely do them yourself.)

I think the 2011 GCSE question is actually pretty good as approximating an expression is a useful, practical skill which is not so mechanical. Presumably, this was a question in an exam without a calculator. (At least when I did the exam, there was a non-calculator exam and a calculator based one.)

If I were to answer the question, I would multiply the fraction 95.9/(0.81\times 0.62) ‘top and bottom’ by 100, and approximate 95.9 as 100, to give 100^2/(8.1\times 6.2). The product in the denominator is then approximately 8\times6 = 48 \approx 50 and thus you get 100\times100/50 = 200, as required.

I consider that a fairly good, practical problem. It’s much shorter than the other two questions, but this can be attributed to a change of style.

The 1971 O Level question is not as good in my opinion. The first part is, I suppose, a test of their understanding of adding fractions. That the whole thing is itself in a fraction is presumably just to make the question seem harder. It shouldn’t really make the question any harder, but I suppose under pressure in an exam it does. The obvious thing to do is to multiply the fraction ‘top and bottom’ by xy. This gives:

\frac{xy}{x+y} = \frac{3 \times 7}{3+7} = \frac{21}{10} = 2.1

The fact the 3 and 7 were chosen, which sums to 10, makes the question easier if it wanted an answer as a decimal expansion. Presumably an answer as a fraction would be acceptable too.

Let us skip the second part for a moment and head to the third. Clearly the values of and q do not matter here, as they are cancelled out. The answer is hence merely 3. If for whatever reason you do not notice this, you have a lot of calculating to do! Nevertheless, it is a simple factorisation and will certainly be tested in more modern exams.

Going back to the second part – it is a pain really. Log tables are the easiest solution within the rules of the exam. Rather than find any particular shortcut, it is easier to ‘just do it’. Find the log of p, double it, anti-log it. Do the same with q. Add the two numbers together. Log it. Half it. Anti-log it. Then we got the answer. It’s boring, and how is it any better than using a calculator? All you are doing is looking some numbers up in a table and occasionally adding or halving them. It’s just a procedure. Perhaps it was useful then, but it isn’t useful now.

The first question, from the 1931 School Certificate, I actually like quite a bit. Let us do the third part first, as it the easiest. It is well known that 1/3 = 0.333…, and that 1/100 = 0.01. Thus 1/200 = 0.005. It follows that:

1/3 - 1/100 - 1/200 = 0.318333\ldots \approx 0.31833

Now let us go back to the first part. We wish to find find the decimal expansion of 7/22. (Note that 22/7 is a popular approximation of pi, which comes from the continued fraction expansion of the number. 7/22 is also the approximation from the first three terms in the continued fraction expansion of 1/pi)

We could just use long division and not think about what we are doing, but I prefer to do these calculations more explicitly. To find the first decimal place of 7/22, we can multiply it by 10 and then take the integer part. This gives 70/22 = 3 + 4/22. Hence it starts 0.3. We then continue with 4/22 – multiply by 10 and take the integer part. This will clearly give the next decimal place. Now 40/22 = 1 + 18/22. Thus the expansion of 7/22 starts 0.31. Continue again: 180/22 = 8 + 4/22. Thus it continues 0.318. We have already seen 4/22, and hence it will yield a sequence continuing 1 then 8 forever. Hence the expansion is 0.318181818…, or 0.31818 to five decimal places.

The second part is unfortunately basically identical, and so is rather dull. We have:

  • 710/223 = 3 + 41/223
  • 410/223 = 1 + 187/223
  • 1870/223 = 8 + 86/223
  • 860/223 = 3 + 191/223
  • 1910/223 = 8 + 126/223
  • 1260/223 = 5 + 145/223

And hence the decimal expansion starts 0.318385, and so is 0.31839 to 5 decimal places.

Therefore our numbers are 0.31818, 0.31839 and 0.31833. The closest to the target number of 0.31831 is thus the third. (What were they playing at! 106/333 is much closer than all of them!)

Perhaps the third part is still useful today, but exactly evaluating the decimal expansions of fractions probably isn’t because of computers. So it is merely more modern.

It’s hard to say really which questions were harder. I didn’t like Part II of the second question as I couldn’t think of an easy way to do it. The other parts of the question were brainless however. The third question was interesting, but the first question was the most interesting, except for Part II which required you to do a load of arithmetic which was almost identical to what you had just done, which is boring. No wonder everybody hates maths.

I don’t have any solid conclusions myself. Indeed, when I was at school I looked through A-Level mathematics exams from the 1960s onwards. I actually thought the hardest papers were in the years 1990-1992. Previous years did not require you to understand anything, but just required you to ‘do’ things which could be learnt without understanding why. Later years were just easy. Happily, the exams I did myself were of the easy variety !

It is often said that people are worse at maths now than previous generations. Perhaps this is true, but I’m not entirely convinced previous generations were any good at it either. For instance, perhaps people who went to score in the 1970s can multiply and divide long integers, but in my experience from having seen people do it, they merely perform an algorithm they have learnt and never understand it. My calculator can do that, and it can do it better. When something goes wrong, they don’t know what they might have done wrong because they don’t know why it works. They might not remember what to do when they encounter a zero, etc.

Perhaps I will try and expand on this subject in future.

Expected number of different birthdays

Suppose there are n people in a room. It is the well known ‘birthday problem‘ that if n > 22 the probability of two people sharing the same birthday is greater than 50%, and so it is very likely that there aren’t n unique birthdays. However, what is the expected number of unique birthdays?

[Let us assume there are D days in a year, in order to be more general.]

Actually, the expected number of unique birthdays is very easy to calculate. Let e(n) denote the expected number of unique birthdays when there are n people.

Choose one of the n people. Then either they have a different birthday than everybody else, or they don’t. They have a different birthday than the other n-1 with probability \left(\frac{D-1}{D}\right)^{n-1}. Let us write \phi for (\frac{D-1}{D}), and so the probability that they have a different birthday to everybody else is \phi^{n-1}. In this case, the expected number of unique birthdays is 1+e(n-1). In the other case, which thus has probability  1 - \phi^{n-1}, the expected number is just e(n-1).

Hence,

 e(n) = \phi^{n-1}(1+e(n-1)) + (1-\phi^{n-1})e(n-1) = \phi^{n-1} + e(n-1)

As e(1) = 1, it follows that:

e(n) = \phi^{n-1} + \phi^{n-2} + ... + \phi + 1 = (\phi^n - 1)/(\phi-1) = D(1-\phi^n)

 (As one should expect, as n tends to infinity this tends to D, as it is almost certain that every birthday gets taken)

For instance, with D = 365, you should expect about 21 different birthdays when there are 22 people, or 46 different birthdays when there are 50 people. When there are 1000 people, there will be around 341 different birthdays. Perhaps it is quite surprising that you should expect there to be so many (24) unclaimed birthdays with so many people!

Consider the following problem:

Suppose a factory is in a country with the following law: Every worker must work every day, unless it is the birthday of one of the workers. In this case, everybody has the day off. How many people should you hire in order to maximize the expected number of man-days worked every year?

Naturally, if there is only a single person he will certainly work 364 days (D-1) a year. If there are two, they are likely to have different birthdays and so each will work 363 days a year (D-2), giving a total of 726 days.

The expected number of man days worked will be the number of people, n, multiplied by the expected number of non-birthdays. The number of non-birthdays is just D – e(n). Thus we are trying to maximise the following:

W(n) = n(D-e(n)) = nD\phi^n

Considering n as a real number instead of an integer, one can find that the function W has a maximum at n = -1/\log\phi = -1/\log(1-1/D) \approx D - 0.5. Thus the maximum must be around this, and so for all but the smallest D (where the approximation is not valid), the maximum will be either D-1 or D.

In fact, it is easily seen that W(D) = W(D-1). It follows that one should employee either D or D-1 people. Since you will probably have to pay these workers, it is better to go for the D-1 people.

A simple change of this problem is to consider that for each man-day of work in the factory, you earn £a. However, you must pay each worker £y per year. Then instead you are maximising the yearly income of:

V(n) =anD\phi^n - ny

The derivative of this is V'(n) = aD\phi^n + a D n \phi^n \log \phi - y. Hence if V'(n) = 0 we must have:

y/aD = \phi^n(1+n \log \phi) = e^{n \log \phi} ( 1 + n \log\phi )

Define w = 1 + n\log\phi so that:

y/aD = w e^{w-1}

And so we have that:

ey/aD = w e^{w}

Let us note that y/aD is the ratio between what you pay the worker per year, and what he earns for you per year if he were to work every day. Let us call this ratio P.

Note that when z is positive, the equation z = w e^w always has a unique solution. The function which takes z to w is called the ‘Lambert W Function‘. Hence we have that:

W(eP) = W(ey/aD) = w = 1 + n\log\phi

It follows that:

n = (W(eP)-1)/\log\phi

As P will be small, and in particular less than 1, an approximation of the Lambert W function may be useful. We will use the Taylor Series approximation W(x) \approx x-x^2+3x^3/2 - 8x^4/3. Along with the usual log approximation, this gives:

n \approx D-\frac{e y}{a}+\frac{e^2 y^2}{a^2 D}-\frac{3 e^3 y^3}{2 a^3 D^2}+\frac{8 e^4 y^4}{3 a^4 D^3}

Or, in terms of P,

n \approx \frac{1}{6} D \left(6-6 e P+6 e^2 P^2-9 e^3 P^3+16 e^4 P^4\right)

For instance, suppose that I pay my workers £18,000 per year. They generate about £350 in profit for me each day they work. They work every day, except for when one of them has a birthday. By evaluating the W function exactly, I find out that the maximum expected profit is when there are around 260 workers. The approximation yields 269 workers. If the profit they generate per day were to be decreased to £75 then the approximation is completely inaccurate. Whilst I should only employee 72 people, the approximation gives over 7,000!