## Friday, January 16, 2015

### M&M Chi-square

The following activity will make use of data gathered in the M&M Activity.

Every regular bag of plain M&Ms contains six colors: red, orange, yellow, green, blue, and brown.  In handling numerous samples of M&Ms, the question arises as to whether there is always about the same proportion of any given color, i.e, is there always about the same proportion of green?

This activity will test that using the chi-square.  We will not be able to prove that the proportion is the same; instead we will able to determine one of two things:

1. The proportion is NOT the same.
2. The amount per bag is consistent with being the same proportion.
I will conduct my test using green M&Ms as they are reputed to have mystical properties the description of which are beyond the scope of our investigation.

I drew a new sample of ten bags of M&Ms which were not connected to the previous activity. The data was as below:

The first column is simple the bag number and does not figure into calculations.  The second column is the total number of M&Ms per bag. The third column is the number of green M&Ms in a given bag.

We will be conducting a chi-square test.  In the chi-square, the null hypothesis is that all of the proportions are the same.  So it cannot be proven that the proportions are the same by the use of this test.  However, it can be proven that they are not the same.  We will compute a chi-square test statistic in the course of this, and if that is too large we will be forced to decide the proportions are not the same.  Otherwise, we will say the data is consistent with having the same proportion per bag.

To do the calculations, I put my data into an Excel spreadsheet will columns as labeled below:
 Bag Number N O 1 17 3 2 17 2 3 19 5 4 18 6 5 19 6 6 18 1 7 18 2 8 18 6 9 19 2 10 19 6 182 39
The first column contains the bag number, the second column, headed by N, contains the total number of M&Ms per bag.  The third column, headed by O, contains the number of green M&Ms per bag. The last entry in the N and the O columns is the total of that column.

The null hypothesis of the chi-square test allows us to assume all the proportions are the same in each sample and, therefore, to pool data.  So pHat, the proportion of M&Ms that are green is 39/182, i.e. pHat is approximately 0.21.  Since the first bag contains 17 M&Ms we would expect it to contain 17*pHat=17*0.21=3.64 M7Ms.  We call this the expected number and denote it by E.  We do this calculation for every bag and obtain the following table:

 Bag Number N O E 1 17 3 3.64 2 17 2 3.64 3 19 5 4.07 4 18 6 3.86 5 19 6 4.07 6 18 1 3.86 7 18 2 3.86 8 18 6 3.86 9 19 2 4.07 10 19 6 4.07 182 39

Note that the expectations in the E column differ from the observed reality in the O column.  The question is whether this is too much to be accounted for by mere random variation.  This requires a measure.  The one we use is (O-E)^2/E.  We take the difference of O and E, square it, and then divide that by the expectation.  The division by E allows to deal with variation that, while perhaps large in absolute terms, is small relative to the expected value.  The completed table appears as below:
 Bag Number N O E (O-E)^2/E 1 17 3 3.64 0.11 2 17 2 3.64 0.74 3 19 5 4.07 0.21 4 18 6 3.86 1.19 5 19 6 4.07 0.91 6 18 1 3.86 2.12 7 18 2 3.86 0.89 8 18 6 3.86 1.19 9 19 2 4.07 1.05 10 19 6 4.07 0.91 182 39 9.34

The final entry in the (O-E)^2/E column is the total amount of error for observations verses expectations.  In this case that amount is 9.34.  The question is whether this is too much to be plausible.  To decide this, we need to look up the critical number on the chi-square table.  As there were 10 samples, this will require 9 degrees of freedom. For DF=9, this number is 16.92.  As 9.34, is not larger than 16.92, we cannot reject the null hypothesis.  We must therefore conclude that the data is consistent with there being approximately the same proportion of green M&Ms in each bag.