Every regular bag of plain M&Ms contains six colors: red, orange, yellow, green, blue, and brown. In handling numerous samples of M&Ms, the question arises as to whether there is always about the same proportion of any given color, i.e, is there always about the same proportion of green?
This activity will test that using the chi-square. We will not be able to prove that the proportion is the same; instead we will able to determine one of two things:
- The proportion is NOT the same.
- The amount per bag is consistent with being the same proportion.
I drew a new sample of ten bags of M&Ms which were not connected to the previous activity. The data was as below:
The first column is simple the bag number and does not figure into calculations. The second column is the total number of M&Ms per bag. The third column is the number of green M&Ms in a given bag.
We will be conducting a chi-square test. In the chi-square, the null hypothesis is that all of the proportions are the same. So it cannot be proven that the proportions are the same by the use of this test. However, it can be proven that they are not the same. We will compute a chi-square test statistic in the course of this, and if that is too large we will be forced to decide the proportions are not the same. Otherwise, we will say the data is consistent with having the same proportion per bag.
To do the calculations, I put my data into an Excel spreadsheet will columns as labeled below:
Bag Number | N | O |
1 | 17 | 3 |
2 | 17 | 2 |
3 | 19 | 5 |
4 | 18 | 6 |
5 | 19 | 6 |
6 | 18 | 1 |
7 | 18 | 2 |
8 | 18 | 6 |
9 | 19 | 2 |
10 | 19 | 6 |
182 | 39 |
The null hypothesis of the chi-square test allows us to assume all the proportions are the same in each sample and, therefore, to pool data. So pHat, the proportion of M&Ms that are green is 39/182, i.e. pHat is approximately 0.21. Since the first bag contains 17 M&Ms we would expect it to contain 17*pHat=17*0.21=3.64 M7Ms. We call this the expected number and denote it by E. We do this calculation for every bag and obtain the following table:
Bag Number | N | O | E |
1 | 17 | 3 | 3.64 |
2 | 17 | 2 | 3.64 |
3 | 19 | 5 | 4.07 |
4 | 18 | 6 | 3.86 |
5 | 19 | 6 | 4.07 |
6 | 18 | 1 | 3.86 |
7 | 18 | 2 | 3.86 |
8 | 18 | 6 | 3.86 |
9 | 19 | 2 | 4.07 |
10 | 19 | 6 | 4.07 |
182 | 39 |
Note that the expectations in the E column differ from the observed reality in the O column. The question is whether this is too much to be accounted for by mere random variation. This requires a measure. The one we use is (O-E)^2/E. We take the difference of O and E, square it, and then divide that by the expectation. The division by E allows to deal with variation that, while perhaps large in absolute terms, is small relative to the expected value. The completed table appears as below:
Bag Number | N | O | E | (O-E)^2/E |
1 | 17 | 3 | 3.64 | 0.11 |
2 | 17 | 2 | 3.64 | 0.74 |
3 | 19 | 5 | 4.07 | 0.21 |
4 | 18 | 6 | 3.86 | 1.19 |
5 | 19 | 6 | 4.07 | 0.91 |
6 | 18 | 1 | 3.86 | 2.12 |
7 | 18 | 2 | 3.86 | 0.89 |
8 | 18 | 6 | 3.86 | 1.19 |
9 | 19 | 2 | 4.07 | 1.05 |
10 | 19 | 6 | 4.07 | 0.91 |
182 | 39 | 9.34 |
The final entry in the (O-E)^2/E column is the total amount of error for observations verses expectations. In this case that amount is 9.34. The question is whether this is too much to be plausible. To decide this, we need to look up the critical number on the chi-square table. As there were 10 samples, this will require 9 degrees of freedom. For DF=9, this number is 16.92. As 9.34, is not larger than 16.92, we cannot reject the null hypothesis. We must therefore conclude that the data is consistent with there being approximately the same proportion of green M&Ms in each bag.
Your assignment is
- to investigate this hypothesis with the particular color of M&M assigned to your group.
- document this as well as possible.
A video on how to do this on Excel can be found here.
No comments:
Post a Comment