Tuesday, July 23, 2013

M&M Epilog: Sex-Sex-Sex

I've commented from the time to time that the standard deviation is the problem child of statistics because, relative to what introductory statistic students have done before, it is calculationally intense.

Its calculation can be simplified if we make a study of the quantity that I called "Fred" in one of the earlier entries.  It is also sometimes called the second moment of x and referred to by S subscript xx.  When I say this aloud, it sounds like "sex, sex, sex."  That might make it more amusing to you, but it might create strange associations with math for you. It might be best not to think on this too much.

Below I've worked out the calculations that simplify Sxx.  If you are geeky enough to want to see how it's done, you are probably geeky enough to follow them.  If not, let me know.


M&M Activity: Part 4

For this, we will again return to the data from the first table:
This time we are interested in the variation of the total number of M&Ms in each bag.  This is given in the row along the bottom. In order to find the average, we could simply take the total 177 and divide by 10 to get 17.7, but we will need the standard deviation too, and we will be doing something more complicated later. Therefore, we need to be a little more sophisticated.

Notice that in his table only four different values appear: 16,17,18, and 19.  I've made e frequency table with columns for x, f, x-squared, f times x, and f times x-squared:
It is comforting that the sum of the f times x column comes to 177 because that is what is was when I added it in the usual way. In my calculations at the bottom of the page, I get a value of x-bar=17.7 as before and I get a value of s=1.1 for the sample standard deviation.  The calculations all follow the same principals as for the single color.

Your answers for x-bar and s should be similar to what got.  If you've followed my directions, you should all have a sample size of exactly 10.  This is a small sample.  When we learn more about taking samples, it will be shown that a sample size of 30 or more is to be desired.

However, there are several groups and each of you took a sample of 10 that means if you pooled your samples you would have a much larger sample. But the candy is all eaten now and it would be a lot of work to do this all again. If only there were some easy way.....

Well, in fact, there is an easy way.  The groups will now share their data with each other.  Let me show you how this works by showing you an example with some faked data.

I used faked data rather than run the experience 5 more times.  When I faked the data, I fiddled around with the frequencies in a table similar to the one above and obtained values of the sum of the f column, the sum of the f times x column, and the sum of the f times x-squared column form each.  I put the results in the table on the sheet below:
I am being careful to tell you this is faked data because (1) we must practice intellectual honesty and (2) I don't want the M&Ms folks going all commando on me.

Once the data is gathered, the calculations are simple.  The sum of the Sigma f column is 50.  There are 5 groups and each took a sample of 10, so the pooled sample is 50.

The sum of the Sigma f times x column is the sum of all of the f times x entries in all of the groups.  For this fake data, it is 875.  Similarly, the sum of the Sigma f times x-square column is equal to the sum of all of the entries in all of the tables.  With this fake data, it has a value of 15,412.

We calculate x-bar by dividing 875 by 50.  The sum of the pooled data by the total number of items in the pooled data.

Notice when we calculate "Fred," we give him a different name.  We use the variable Sxx.  When I say this out in class, it sounds like I'm saying "sex, sex, sex." 

Note that the n value is 50, so the n-1 value is 49.

Your assignment is two-fold:
  1. Do the calculations as described to calculate the sample mean and the sample standard deviation for the number of M&Ms in each bag.
  2. Share your data with the other groups; pool it; calculate the sample mean and the sample standard deviation from the pooled data.
Every group's answer on number 2 should be the same if every group does it correctly.  Each groups should make its own submission.

Monday, July 22, 2013

M&M Activity: Part 3

Let us now revisit the original table of data:
Notice that I've added a Totals column to the far right and a Total row to the bottom.   Other than those additions, the data is exactly the same as I've been using.

Look along each row labeled by color.  Notice that there is variation.  In the Red row, for example, the values vary from 1 on the low end up to 4 on the high end.  I've take this separated this data in the table below.


While there is variation, there is not all that much. We calculated in the last activity that the sample standard deviation is only 0.9 which is relatively small.  The range of the data (high minus low) is 3.  While there are rigorous techniques to justify this, it is not unreasonable to assume by just looking at it that each bag contains approximately the same proportion of red as any other.

We will now examine this assumption in more depth.

Notice that at the bottom of the Reds table, I've calculated a quantity that I've called Expected Proportion and have labeled it with the variable p-hat.  Note that p-hat is 17/177 which rounds off to 0.096.  This means that 9.6 percent of the M&Ms we counted are red.  The 17 is obtained as the total of the red row and the 177 is the total number of M&Ms.

The question we ask now, are 9.6 percent of the M&Ms in each bag red?  How do we figure that because, as you may have noticed from our first table, the number of M&Ms in each bag varies from 16 to 19?  Well, if a bag contains 16 M&Ms and 9.6 percent of them are red, then 16 times 0.096 of them (1.7) are red.  I've done this calculation in the table below:
  Compare each entry in the column labeled "Red" to the corresponding entry in the column labeled "Expected Red."  They don't differ too much, do they.

I've carried out this same activity for all of the colors. The results are below.  In carrying out the calculations for Red to three decimals, I noticed they weren't all that much different than using two decimals, so in the subsequent examples, I've only gone to two decimal places.
Take the data that your group gathered, and process it as I have above.
  • Make a table of your original data, adding a total column and a total row as I have. 
  • You will need a separate table for each color of M&M.  
  • You will need to calculate a value of p-hat corresponding to each color.  It is suggested that at least two people, working independently of each other, perform the calculations involved.
  • You will need to submit your data in electronic form.

Friday, July 19, 2013

M&M Activity: Part 2

We are now going to process the data you collected and tabulated in the first part of this assignment.  Recall my data was as follows:
We will begin with the first row, the red M&Ms.  We will be learning how to deal with data in a frequency table.  The first step is to put that data into a frequency table.  There are a variety of ways to do this, but I am going to do this first row by ordering the data and then counting it.

Look first at the top part of the page.  In the first column which is labeled "Raw data," I've simply listed the numbers in the same order as they occured in the original table.  In the column labeled "ordered data," I've listed from the smallest to the largest.  This makes them easier to count.

In the part of the page labeled "Frequency Table," I've set up a five column table with labels x, f, x2(x-squared), fx (f times x), and fx2 (f times x-squared).  In the x column, I put the possible values of x from the lowest that occurs to the highest.  There were 5 bags that contained only one M&M, there were 4 that contained 2, there were no bags that contained 3 (I could've left this row out if I wanted to), and there was one bag that contained 4 M&Ms.

So here f stands for frequency. 

In the x2 column, I've put the squares of the values from the x column.  It kind of makes sense, eh?

In the column labeled fx (f times x), I've put the product of the the value from the f column and the value from the x column: 5 times 1=5, 4 times 2= 8, 0 times 3=0, and 1 times 4 = 4.  I've done a similar thing with the column labeled fx2 (f times x-squared): 5 times 1=1, 4 times 4=16, 0 times 9=0, and 1 times 16 =16.

After filling in this table, I found the sums of the f, fx, and fx2 columns.  They are 10, 17, and 37, respectively. 

I knew before I started that--if I did everything right--the sum of the f column would be 10.  This is because we sampled 10 bags and the sum of the f has to be the sample size.

We can use the sum of the f column and the sum of the fx column to compute the sample mean, x-bar, which is 17/10=1.7.

We need the sum of the fx2 column to calculate the sample standard deviation.  As you may recall, the standard deviation is rather work intensive to calculate.  When you first learns how to calculate standard deviation, you first calculate the sample mean.  Then you put in a new column which consists of the difference between the value of the data item and the value of the sample mean. Then you put in a column which consists of the square of the previous column.  Then you add up that column.  Call the sum Fred (I just like the name) and divide that by the sample size minus 1. 

This is a lot of work just to get Fred. This is awkward as it requires you to calculate x-bar before hand.  However, there is a formula for Fred that removes the awkwardness.  This formula is hard to describe in typing, but I will give it a go. In your left hand, put the sum of the fx2 column.  In your right hand, put what you get when you divide the square of the sum of the fx column by the sum of the f column.  The left hand minus the right hand is Fred.

Don't worry, it's written out on the page above: 37-(17 squared)/10=8.1.  To get the sample standard deviation from this, we go to the next page:
The sample variance is Fred divided by sample size minus 1.  That is 8.1/9=0.90.  The sample standard deviation is the square root of the sample varience. This is 0.949..., but we round it off to one decimal place beyond the original data, making s=0.9.

I've done this for each of the colors of my data:
Your assignment is to take your data and go through this same process.  You will be working in groups, so I would suggest to organize your work in such a way as to have at least two people to independently do the calculations on each. 

The submission for each group may be organized as above.  In whatever way it is organized, it must include
  1. A table for each color of M&Ms with the five columns as above.
  2. A calculation of the sums of the f, fx, and fx2 columns.
  3. A calculation of x-bar (the sample mean) and of s (the sample standard deviation).

Thursday, July 11, 2013

M&M activity

To begin with each group will need the following;
  1. A sack of Fun-Size bags of M&M's.
  2. Small gummed labels.
  3. Sharpie/Felt tipped pen.
  4. Scissors.
  5. Notebook
  6. Pencil
  7. Smart phone with a camera.

(1) Your sack of Fun Size M&M bags should have at least 10 small bags of M&Ms.  You will need at least 10 little bags for this activity.
If you cannot find Plain M&Ms, Peanut will do.  Be mindful if anyone in your group has a peanut allergy though.  If you can't even find Peanut M&Ms, then some other candy will work.  I've had students use Skittles, but the colors are different and the distribution of colors is different.  When you begin the process, you will need ten bags of the same type of candy and each bag of that candy must have a variety of colors.

(2) You will need gummed labels and (3) a Sharpie to number those gummed labels.

you will be using these labels to label 10 of your M&M bags as below:
You could possibly get by with masking tape and a pencil.  This would make it more difficult for me to read the picture your are going to take of this to prove that your group actually did the work.

You will also need (4) scissors to open each bag as below:

Yes, you could just rip it open. (WHY are you being so difficult?) If you rip it open, you might get carried away and spill the bag.  I know how you are under high-pressure situations.

Once you open the bag, you are going to sort the M&Ms by color, count them, and note the number of each color on a piece of paper like this:

The numbers along the top of the page correspond to the numbers you labeled each of the bags with.  The colors to the color of the M&Ms.  The first bag contains 1 red, 5 orange, 3 yellow, 5 green, 2 blue, and 2 brown M&Ms. You can read this off of the column labeled with a 1 at the top.  This is why you need the (5) notebook and the (6) pencil.  Yes, you could carve it in clay with a stick if you like, but isn't this way easier.

I would advise you to wash your hands before you start counting because you are going to eat them afterwards and if you don't wash your hands first you might catch something, but this is up to you.

Oh, yes, this will need to be done in such a way to document that you did it as a group.  Using a smartphone with a camera would be the easiest way.  Some of you Facebook funerals, for Heaven's sake, so you ought to be able to come up with a smartphone among you or your friends.

Your complete assignment will be the data that you've collected and some documentation that provides evidence that everyone took part in some way. I leave this up to you.