This is the first of a series of posts highlighting mathematical aspects of card collecting. In this post we look at the challenge of completing a set from packs.

One of the most enjoyable but also frustrating parts of card collecting is the dying art of collecting a set of cards from packs. True, almost nobody does this anymore, given how easy it is to purchase entire sets or pick up just the cards you need from your local card shop or sites like eBay or COMC. However, any “old school” collector worth his cardboard knows there are only two ways to truly earn the achievement of set completion–

• Opening packs

As it turns out, the challenge of completing a set from packs gives rise to some interesting mathematics. For the average reader, the math is difficult enough that we will at least temporarily abandon the problem of completing a large set (e.g., 1978 Topps with 726 cards) from multi-card packs and focus on a very small set with single-card packs. In future posts we may progress to more demanding settings.

### A MORE BITE-SIZED SET

A good example we can start with–especially since I’m writing this on Mothers Day–is the 1989 Mother’s Cookies Jose Canseco set. Each specially marked package of cookies came with a single Jose Canseco card, and there were four cards that made up the entire set. I remember collecting this set when it came out and needing to eat a LOT of cookies along the way! Was I just unlucky (if you could call eating cookies unlucky), or was it “in the cards” that this set would take so long to complete?

Before getting out the calculators, there are a few quick observations that govern the problem–

• Completing the set would require a minimum purchase of four packages.
• The more packages purchased, the greater the probability of set completion.
• But…no reasonable number of packages (e.g., 50) guarantees set completion.

What then is the precise relationship between the number of packages purchased and the probability of set completion?

#### Four Packages

For starters, we will look at the probability of completing the set with a minimal purchase of four packages. If we treat the four different Canseco cards as A, B, C, and D, then we can actually list all 4^4 = 256 possible results. A well-organized list might start out something like this—

AAAA  AAAB  AAAC  AAAD  AABA  AABB  AABC  AABD  AACA  AACB  AACC  …

Of these results, the ones corresponding to complete sets would be exactly those with one of each letter—i.e., the solution ABCD and all its re-orderings. Because there are 4! = 24 such orderings of ABCD, the probability of set completion in four purchases is 24/256 = 3/32, which is about 9%. In other words, you’d have to be VERY lucky to complete your set this way.

#### Five Packages

From here, we can take a similar approach to finding the probability of set completion in five purchases. The number of possibilities becomes 4^5 = 1024. Were we to list all of them, our list might start out something like this—

AAAAA  AAAAB  AAAAC  AAAAD  AAABA  AAABB  AAABC  AAABD  AAACA … Once listed, we would then count up all sequences that include at least one of each letter. However, we can do better than that. Aside from order, we know the only sequences we are interested in are AABCD, ABBCD, ABCCD, and ABCDD. Of course, each of these four sequences can occur in many different orders, so we will need to find out just how many before we can solve the problem.

In general a sequence of five elements has 5! = 120 different orderings possible. When two of the elements are identical, we reduce the number of orderings by a factor of 2. Therefore, each of our solutions (e.g., ABBCD) will be represented 60 times among the list of 1024 possible results. Since there are four such solutions, our probability of set completion is 240/1024 = 15/64, which is about 23%. Comparing this to our previous result, the probability of set completion more than doubles with the purchase of the fifth pack. However, it is still rather low.

#### Six Packages

Now let’s take a look at what happens when six packages are purchased. There are 4^6 = 4096 possible results from the six packages. And once again, we will have complete sets if, aside from order, we end up with these solutions:

AAABCD, ABBBCD, ABCCCD, ABCDDD

However, we also need to consider these other solutions, which (apart from order) are even more common.

AABBCD, AABCCD, AABCDD, ABBCCD, ABBCDD, ABCCDD

In the first grouping, where there are triplicates of a particular card, the number of orderings for each sequence is 6!/3! = 120. In the second grouping, where there are two pairs of duplicates, the number of orderings per sequence is 6!/2!2! = 180. Therefore the total number of solutions = (4 x 120) + (6 x 180) = 1560, making the probability of set completion 1560/4096, which is about 38%.

#### More than Six Packages

It is possible to follow an approach similar to the above to continue through ever larger numbers of packages. With enough careful thought, it is even possible to arrive at a formula connecting the probability, P, of set completion to the number, N, of packages purchased. And for the truly ambitious, it would be a fairly small matter from there to extend the formula to sets with more than 4 cards. However, I thought it would be fun to share what feels like a completely different approach, which I believe will be more accessible to the less mathematical reader who has not already studied combinatorics.

The rough outline of the approach is this–

• After purchasing N > 0 packages, the collector will have either 1, 2, 3, or 4 different cards. Let M represent this number.
• Given M, it is very easy to determine the probability that M will either stay the same or increase by 1 with the purchase of an additional package. (And it is also very easy to see why these are the only two options.)
• If the probability distribution for N packages (and feel free to imagine N as a small number like 4) is known, there is simple way to generate the probability distribution for N + 1.

Although this technique will work even when N is less than 4, let’s take our starting point as N = 4. We now need to find the probability of 1, 2, 3, and 4 different cards respectively. Much earlier, we already found that the probability of 4 different cards was 3/32. Though there are always shortcuts and tricks, let’s assume at worst that we identified the remaining probabilities by simply listing all 256 possible results and counting things up. If so, we would have found the following. (I maintained a common denominator to make it easy to check that the probabilities indeed sum to 1.) For each case (i.e., value of M), we can now compute the probability that M will increase by one or remain the same with the purchase of one more package. A powerful property of the probabilities in Table 2 is that they are independent of N. (Technically one could argue about whether this is true for N < 4, but our main interest from here is N ≥ 4 regardless.) To give a simple example of what this means, no matter how many packages have been purchased, if you have 3 of the 4 cards in the set the probability you will complete your set with your next package is always 1/4. As a result, the approach that allows us to progress from N = 4 to N = 5 is one that will allow us to go from any value of N to N + 1. In case the calculations shown in Table 3 are unclear, here is how the 5-package entry for M = 2 was derived.

• The probablilty of already having exactly two different cards at N = 4 and not getting a different card in the fifth package is 21/64*2/4.
• The probablity of already having only one different card at N = 4 and then getting a different card in the fifth package is 1/64*3/4.
• These are the only two ways to end up with two different cards after opening the fifth package. Because they are mutually exclusive, their probabilities can be added to arrive at the likelihood of ending up with two different cards after five packages.

Advantages of this approach are that it is (eventually) quite intuitive and that it requires only basic knowledge of probability as opposed to more advanced combinatorics. And while it doesn’t easily lead to a general formula P(N) expressing set completion probability as a function of the number of packages, it does lend itself very easily to iteration in a spreadsheet program like MS Excel.

The table below provides all values through N = 28. Rows are highlighted that correspond to the set completion probabilities nearest 50%, 75%, 90%, 95%, and 99%.

Table 4 A graph of the same data (through N = 35) illustrates how each value of M changes as the value of N increases. #### Other Applications

While most sports cards come more than one to a pack, there are a great many collectibles that come one to a pack–

• Older baseball cards such as T206 and 1933 Goudey
• More modern cereal cards such as 1979 Kellogg’s baseball
• 7-11 Slurpee baseball coins
• Most non-sports issues
• McDonald’s Happy Meal toys

It is also possible to apply the same model to non-collectibles, for example if you wanted to keep eating M&Ms until you ate at least one of each color.

Conclusions Perhaps the biggest takeaway here, which most collectors have experienced firsthand many times over in their collecting careers, is that finishing a set from packs can be a very slow process. In the example of the four-card Jose Canseco set, it would not be unusual at all to have to buy 7-10 packages of cookies, resulting in a collection about twice as large as the set itself. And sure enough, as the size of a set gets larger, this phenomenon only gets worse.

Barring a typo in the London Cigarette Company catalog, this 1901 Ogden’s “Guinea Gold” cigarette card of Sir Isaac Newton was card #119 from a set of 1148! Can you even imagine how many cigarettes a collector would have to purchase in order to complete this massive set? Even at the point the collector had all the cards but one, he/she would still have less than a one-in-a-thousand chance of completing the set with each additional pack purchased. Recognizing the lifespan-reducing properties of the product itself, I have to imagine far fewer collectors completed this set than the number who literally died trying!

Fortunately for Jose Canseco fans, the Mother’s Cookies set proved a far less imposing challenge. Many of us–myself included–had to buy a lot more cookies than we originally planned to, but we ultimately made it. As for any collectors out there who didn’t fare so well, isn’t that the way the cookie crumbles!