Iterating Subsets of a Specific Size

In my recent List object for Multimedia Fusion, I added a feature where the sub-lists of  particular size of a specified list can be iterated through. For instance, suppose we have the list [1,2,3,4,5] and we want to iterate through the sub-lists of size 3. There are ten of these, as shown below in no particular order:

  1. [1,2,3]
  2. [1,2,4]
  3. [1,2,5]
  4. [1,3,4]
  5. [1,3,5]
  6. [1,4,5]
  7. [2,3,4]
  8. [2,3,5]
  9. [2,4,5]
  10. [3,4,5]

These can be represents with a bit string of length 5, where the ith bit is on exactly when the ith element is included in the sub list. Therefore, we are looking for numbers between 0 and 31 with exactly three bits on.

It is arbitrary which bit corresponds to which element, but let us decide that the least significant (that is, right-most when written down) corresponds to the first element (i.e. 1 in our example), the second least significant bit the second element (i.e 2 in our example), and so forth. Under this scheme, with can associate the sub-lists above by the following numbers.

  1. [1,2,3] represented by 00111b = 7
  2. [1,2,4] represented by 01011b = 11
  3. [1,2,5] represented by 10011b = 19
  4. [1,3,4] represented by 01101b = 13
  5. [1,3,5] represented by 1010b1 = 21
  6. [1,4,5] represented by 11001b = 25
  7. [2,3,4] represented by 01110b = 14
  8. [2,3,5] represented by 10110b = 22
  9. [2,4,5] represented by 11010b = 26
  10. [3,4,5] represented by 11100b = 28

So, how to generate numbers of this form? It would be best if we could generate them in numerical order, and one suspects it might be easier to like that as well. So we want to generate the numbers 7, 11, 13, 14, 19, 21, 22, 25, 26 and 28.

In fact, it is easier to take one number and go to the next. So have a function which turns 7 to 11, 11 to 13 and so forth. How to do this is described in what follows.

Call the number we start with x. Suppose for now that x is such that all the ‘on’ bits are bunched together. For instance, 1110000b (112). In this specific instance, we want to turn it into 10000011b (131).

First we need to find the lowest ‘on’ bit and isolated it, denoting it as L. In our example, L = 10000b (i.e. 16). How is L calculated? It depends exactly on how x is represented in memory, but if it is a signed integer you can just do L = x \:\&\: (-x).

Now consider x + L, which is 1110000b + 0010000b = 10000000b in our case. As you can see it clears the run of 1’s, and sets the bit just left of it to ‘on’.

So the next thing to do is set the lowest order few bits to ‘on’, where ‘few’ means one less than the original number of bits which were on. If we knew the number of bits which were on in the first place, this would be very easy to do with a bit mask. However, we can do it another way which makes it unnecessary to explicitly work this out.

If we take 1110000b, we can right-shift the appropriative number of places so that we are left with 11. In fact, an easier way of doing this is just to divide it by 2L. It is clear this works as dividing by 2k is the same as right shifting by k. (This is obvious if you consider binary numbers as a sum of powers of two.)

If we ‘OR’ these two values together – that is x+L and x/(2L) – then we get the ultimate answer.

To sum up: Providing x has all it’s ‘on’ bits in one block, we can find the next one by (x+L) | (x/[2L]). This can be implemented as follows:

int next( int x )
{
     int L = x & (-x);
     int bigBit = x + L;
     return bigBit | (x/(2*L));
}

So finally there is the case where there are other ‘on’ bits left of the first block of ones. The easiest thing to do here is to reduce it to the problem we have just solved =) Just get rid of the of those bits and put them in again afterwards.

So, when we work out x + L, we want it to have just a single bit on. However, it may well turn out to have bits on to the left of that wanted bit. These unwanted bits are the only bits which are in common between x+L and x. Therefore we can find the bits which are (for now) unwanted by finding (x+L)\:\&\:x. We can then subtract it from x and apply the old method, adding the values back in again afterwards.

The C code could look as follows (where it uses the next function defined above):

int general_next( int x )
{
    int L = x & (-x);
    int bigBit = x+L;
    int unwanted = bigBit & x;
    return unwanted | next(x-unwanted);
}

The code has been presented with reference to the old function just to make it clearer how we are solving a specific problem and then reducing the general problem to it (a technique which is usefully the best way of solving maths problems). However, it is probably better for a program to have it all in one code. For instance,

int next( int x )
{
    int L = x & (-x);
    int bigBit = x + L;
    int unwanted = bigBit & x;
    x -= unwanted;
    return bigBit | ((x/L) >> 1);
}

In fact, the code in the List object is a bit tighter than that still. Also, instead of using usual integers to store these values, it uses arbitrary-length unsigned integers. It is thus not possible to calculate L in the manner shown on the first line, but in fact \log_2 L is easily received as the information is kept due to the method of storage. This doesn’t cause any problems: L can easily be formed by a bit-mask, and the division by L on the last line is better done as a shift in any case.

This allows the list object to find sublists of very long and even infinite lists.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: