Wednesday, October 28, 2015

A Heap of Papers

A Heap of Papers

By Bobby Neal Winters
The other day I was walking around in one of the buildings on campus that is used whenever construction is being done somewhere.  The whole building, in its way, reminds me of the Room of Requirement in the Harry Potter books.  Whenever someone needs space, there it is.
While there isn’t anyone housed there currently--at least I didn’t think there was--it is an interesting place to walk around in.  You look in the various rooms and find all sorts of antiquated equipment.  There old ditto machines, ancient computers, chalk boards and implements to aid in writing on chalkboards.  The latter would include on of those 5-pronged chalk holders music teachers use (or at least used to use) to draw the scale on a blackboard and a chalk-holding protractor for geometry teachers.
It is also full of smells.  There is the sweet smell of machine oil; there is the smell of copier fluid from the ditto machines; there is the smell of dust.
There is also silence, silence broken only by the  sound of my footfalls on the floor, which is a checkerboard of aged linoleum tile and bare concrete.  
But then to my surprise I heard what seemed to be muttering.
At first I thought it was in my head, but as I turned my ears from side to side I found it was stronger in the direction of a door at the opposite end of the hall. Interested in seeing who else was in the building, I followed the sound until I came to the door.  When I got there, I heard the muttering quite clearly.  I knocked on the door.
“Hello,” I said as my knuckles wrapped against the wood, dirtied by the sweat from the hands of many generations of student.
“Hum, ha, ah...ah...ah..come in,” stammered a voice from the other side.
Opening the door, at first a crack and then wider, I saw a thin man with hair the color of chalk.  His face was thin and wrinkled; his skin was as thin as rice paper; but his eyes were clear and his hands were steady.  Behind him was a stack of papers that he appeared to have just finished grading, as the copious red ink on the top paper had yet to try..
“Hello,” I said again.  “I didn’t think there was anyone else over here.  I don’t recall any faculty being officed over here currently.”
“Well,” he paused. “I am still here.  They told me I could stay until I got my last set of exams graded.  I just finished so I should be vacating the premises soon.  I just want to order the exams from high score to low score so that I can rank them.”
Looking at him and seeing how old he was and looking at his stack of papers and seeing how many there were, I offered to help.
He refused this offer, however, saying he had a device.
“A device?” I echoed quizzically,
“Yes, look here,” he said indicating the back wall of the office.
Looking at the spot indicated, I saw what I’d originally taken to be an odd sort of plant stand.  It was roughly triangular in shape with the narrow part at the top.  The top consisted of a single plate, suspended from the ceiling.  Below that plate were two separate plates; below each of them two more for a total of four on that level.
Going down each level, there were twice as many shelves as the level above.  This went on for five levels total.  There were a total of
1+ 2+4+ ... + 32= 63
plates altogether.
He began to place his papers on the device in the following odd sort of way.  He put the first paper on the top.  It had received a 64.  He next one, which had received the score of 63; he compared that to the 64 and as it was less he put on the plate hanging just below it to the left.  The next one would have gone to on the plate the right, but it had a score of 69; so he took the 64 and put it there and put the 69 at the top, where the 69 had been.
The next paper was poised to go on the extreme left plate of the third level from the top, but it was a 72.  He took the 63 from the plate above it and placed it in the left plate instead.  Then he checked the value of the paper in the plate above that one; it was a 69, so he put that in the recently empty plate and put the 72 in the top plate in its stead.
After seeing him go through his motions a few more times, I noted the following:  Every paper had a lower score than that of the plate just above it.  As a consequence of this, when a given step was finished--that is when a given paper had reached its final destination--the paper with the highest score so far occupied the top plate.
It didn’t take him too long to get all his papers into the device. At that point, I noted the high score was an 84.  I looked at it and noted that it was written in blue ink and bore the scent of something that had been copied using a ditto-master.
I didn’t have long to look, however, as he took that paper and laid it face down on his desk.  The top plate was now empty. He proceeded to take the rightmost paper from the bottom row and put it in the top plate.  He then looked at the two papers below and swapped it with the paper that had the higher score of those two.  Following the paper he’d brought up from below, he repeated this process, i.e. he switched it with the larger of the two papers below it.  When this could no longer be done, i.e. when it was larger than the two papers below it, he stopped.  
Each of the papers were positions so that their scores were larger than the scores of the papers in the plates below them. Therefore, the paper in the top plate again had the highest score.  He removed it, put it face down on the precious paper.  Then he repeated the process again and again until all of the papers were stacked face down, atop each other on his desk. When he turned his stack over, they were in order with the highest score on top and the lowest score on bottom.

The Technical Part (Those only interested in the story may scroll to the end)

What he had done with his papers is called a heap sort.  The condition of having all values stored in the relative positions as described in which each value larger than the two values below it is called a heap.  
The heapsort is typically taught in the context of computer science and the technical details of computing using pointers, arrays, subscripts, and so forth can get in the way of understanding what is, essentially, a fairly simple technique.  I admired how this fellow’s simple device made the concept clear.  Once the concept is clear, it is not difficult to proceed to the technical details of dealing with array subscripts etc.
The algorithm proceeds in two steps.  The first is the building of the heap in which the largest element is found.  The second is disassembling the heap and reordering the remainder until nothing is left and everything is ordered.
The heap has the general structure of what is called a complete binary tree.  A tree has nodes, which were the plates the the story above.  It is arranged from top to bottom with the top node, oddly enough, referred to as the root. Below it are two--this is where the binary comes in--nodes which are referred to as its children.  The root is their parent. Both of these nodes have two children.  They will be referred to as the left child and the right child in the obvious way. Each generation is a level.  Saying that it is complete is saying that every level but the last is completely filled.
Values are placed in the nodes in the manner described above preserving the property that the value in each node is at least as large as the values in the children nodes.  The left child and the right child don’t have to have any particular relationship to each other; they are simply each lower than the value in the parent node.
One cannot, of course, literally build a device like the one the professor had in the computer, but is is possible to build something that is functionally equivalent.  We can do this by imposing some structure on an array.  For those of you who don’t know, an array is an object in a programming language that is like a numbered mailbox.

The numbering on the left of the picture above indicates the index.  The empty space on the right indicates a place for the value to go.  If an array is called A, the value stored at the i-th spot of the array is A[i].  In the picture, I’ve numbered the array beginning with 1 as is done in the computing language R.  Most other computing languages begin the indexing with 0.
We capture the structure of the binary tree by letting the 2i-th position and the (2i+1)-st position be the children of the i-th position. Conversely, floor(i/2) is the parent position of i. (This must be modified slightly to work if your array’s index begins with 0.)  This structure is indicated in the figure below:

When you begins the process in creating a heap, thing about it as if your values are in an array. Begin at the top and start sliding down. Compare the value at each position with its parent’s value. Keep swapping it upwards until it is as high at it can go, and then proceed down to the next value doing the same there.  This is equivalent to the procedure described above, but less colorful in its description.
When the heap is created, the largest value is in A[1].
Thus ends the first part of the algorithm.
The second part is described as shoveling that largest value from the top, replacing it with the final value from the array, and then restoring the heap-ness to the remaining array.  This uses a function called appropriately heapify. The syntax is heapify(array, root) and it is called recursively.  
When we remove the value A[1] and replace it with the final value stored in the array, it does destroy the heap-ness of the array.  However, if you consider the children of the root as being roots of their own trees, those trees are heaps themselves. Heapify will swap the new value at the root with the largest of its two children, check if the value is larger than both of its children in its new position. If so, we have a heap and stop; otherwise it will call itself with its new position as root, and so on until we have a heap.

Now we are back

All of this went through my head in the moment it took him to turn his papers over.  When he did, I happened to see the name of the top score of the paper.  It was the same name of the retired chair of our Computer Science Department.  I looked briefly from side to side and noted that there was not a computer on this man’s desk.  
I now looked at him again and noted that I’d seen him before: It was when one of our history faculty had given a presentation on our university in the 1960’s.  
That was weird.
“Ah,” I said, “so are you going to move to your new office now that your papers are graded?”
“Well,” he said with a sad expression. “One of the things about the place I am now is that the papers are never graded, the temperature is never comfortable, and the students show up outside of office hours.”
As he said this, he became first translucent and then transparent and then he was gone.  The papers he was holding dropped to the floor and scatter.  I leaned over to pick one up, and instead of smelling like ditto-fluid, it smelled like dust.
I turned on my heal and beat a hasty retreat.  As I went down the hall, I passed a room that had an old skeleton from biology class stored in it.
It waved at me.
Happy Halloween.

Monday, May 18, 2015

Oh Give Me a Cone

Oh Give Me a Cone

By Bobby Neal Winters
The shortest distance between two points is a straight line.  Remember that even when it is not true.  And there are times when it is not true.  The surface of the planet earth would be one example.  The world, as has been known since the Greeks is a sphere. (There are a group of hyper-correcting sphincteroids out there who will said the earth isn’t a sphere, but an oblate spheroid, but quite frankly it is so close to a perfect sphere that you can’t tell without some pretty accurate measurements. But I digress.)  
In any case, the world is a sphere, and there are no straight lines on it because spheres are round as round can be.  I want you to think about two points on the surface of a sphere.  The center of the sphere is a third point.  These three points do not lie on a straight line, so they determine a plane.  That plane will cut through the sphere in a circle and that circle is called a great circle.  Are you with me so far?  If not, go back and read this paragraph until you are.  Have you done that?  Okay, then, let us proceed.
Any two points will cut a circle into two arcs.  The shorter of the two arcs is the shortest distance between two points.  This is a nice example.  The solution is elegant.  Everything can be stated it simple language.  We walk away from it loving life, mathematics, and geometry while we whistle zippity doo dah.
We now wish to apply our mathematical skills in other areas.  This leads us to the concept of a surface.  Surfaces are mathematical constructs that generalize the notion of the plane (where the shortest distance between two points is a plane) and the sphere where the shortest distance between two points is an arc of the great circle.  
It is at this point in the discussion of surfaces that one typically runs of into a discussions of toruses (tori) of the one-holed, two-holed, and n-holed variety.  That would be an interesting discussion, but I want to talk about something that seems, at first blush, to be simpler: the cone.  I was drawn into this by the Facebook discussion of some intelligent amateurs (in the literal sense of that word) who got considerably more than they bargained for.  The same has happened to professionals.
The question is: what is the shortest path between two points on the surface of a cone?
Before we go on to engage this question, I would like to lay down some information on the cone.  The cone is known from antiquity.  I suppose that anyone who had papyrus or vellum laying around wouldn’t have to use too much imagination to roll it into a cone.  The ancient Greeks in the centuries before Christ did work with what we call the conic sections, discovering that if you intersect a cone with a plane at various angles you get ellipses, hyperbolas, and parabolas. These are very important curves and became even more important when Isaac Newton discovered that objects under the influence of gravity follow such curves.
So you take what you know about the great circle and shortest distance and you add that with what you know about how the plane interacts with the cone, and it is quite natural to form the believe that the shortest distance between two points on a cone will be the shortest arc of some conic section.
It is also quite wrong.
Let us now engage with the issue of the shortest path between two points on a cone.  In some sense, this is an easy question.  Mark the two points and then take your mathematical scissors out to cut from the base of the cone to the point of the cone in such a way as to not cut the shortest path.  You can then roll the cone out flat on the plane.  We can then see the shortest path is a line segment in the plane, just as we said in the first paragraph.  
This is not something that can be done with the great circle on the sphere.  The sphere has a property called constant positive curvature.  The cone, even though it looks curved, has zero curvature, like the Euclidean plane. (The hyperbolic plane has constant negative curvature; this means it looks like a Pringles potato chip at every point, but a discussion of this would take us too far afield again.)
So when we have the cone rolled out flat on the plane, we see that all we have to worry about are lines, but it is still possible that when roll the cone up in space that straight line will yield a conic section of some sort.  In order to show that our hopes and dreams of such an elegant solution are for naught all I have to do is to come up with one curve on the surface of a cone obtained from a line in the way described that is not an conic section.
Consider the cone obtained from folding the half plane in such a way that the origin becomes the cone point.  
Now consider a vertical line that is parallel to the y-axis at a distance of one unit.  In the cone, this curve looks like it might be a parabola or a branch of a hyperbola a first glance, but it can’t be.  It has two ends and those two ends remain within two units of each other.  In either a parabola or a hyperbola, the two ends become infinitely far apart.
When I first began playing with the problem of geodesics on a cone (geodesics are the technical name for the type of curve that gives the shortest path between points), I’d thought that the circles on the cone that are centered at the cone point would be geodesics.  I was quickly disabused of this idea, however, when I observed that the chords of these circles (when they are laid flat on the plane) are shorter than the subtended arcs on those circles.
So this was a failed mathematical experiment, but it leads to a thought experiment.  Any geodesic will be a line in the plane when the cone is unrolled.  Consider now the family of circles centered at the cone point.  Begin at the cone point, and let the circles increase in size from 0 radius until it makes first contact with the geodesic.  That initial point of contact (thinking of the cone point being at the top) will be a maximum point of the geodesic.  Let the circle increase in size until it is just past the maximum point of the geodesic.  

When we lay this flat, it the geodesic will be a chord of the circle centered at the cone point.  Take a perpendicular bisector of the chord through the cone point and cut along it.  From this we see that any geodesic for the cone can be obtained from a perpendicular to a radius for the cone.
A little playing with this will show you that how steep the cone is will matter a lot regarding the type of geodesics you get. For example, if you make a cone from a quarter plane, the geodesics will cross themselves.

Friday, March 6, 2015

Lines in Pleasant Places

Lines in Pleasant Places

By Bobby Neal Winters

Lord, you alone are my portion and my cup;
   you make my lot secure.
The boundary lines have fallen for me in pleasant places;
   surely I have a delightful inheritance.
I will praise the Lord, who counsels me;
   even at night my heart instructs me.
--Psalms 16:5-7, NIV

It is rumored that deep within the Ozarks there is a place where those interested go to learn the art of making moonshine.  It is called the Moonshine Academy.  Every year a new class of students enter the academy, but not everyone decides to come back the following year.  The proprietors of the Academy are concerned about this because their aim is spread the art of moonshine making as far and wide as possible.  Those who leave the Academy tend to become Revenuers. They would like to know whether there might be a way to tell ahead of time which of their new students might be at risk of becoming Revenuers.
When each student comes to MA, a record is made of the length of their beards and the number of missing teeth.  The proprietors wish to make predictions based on those two characteristics, having noted that those who have become successful as moonshiners have certain standards with respect to them.  

Some Math

There are two groups, those who succeed and those who don’t. Call the group that succeeds the Elect and denote them by E. We will denote the rest by R, for Revenuer.   We will begin the discussion by assuming a simpler situation than what we have and supposing there is only one characteristic or, to use the standard nomenclature, random variable to measure.  Call it X.
Random variables are functions that take values on populations.  We will write X|E for X restricted only to members of the Elect and X|R for X restricted only to revenuers.  We hope that we won’t be piling on too much notation if we let mean(X|E) denote the mean of X for members of the Elect and let mean(X|R) be the mean of X for members of the Revenuers.
Suppose that we find mean(X|R)
Overlapping Normals.png
In the figure, we note mean(X|R)=3 and mean(X|E)=5.
In our dreams, we want a magic number C (for cutoff). Say a student named Maynard walks in and his random variable is X(Maynard).  We would like to be able to say that if X(Maynard)> C, then Maynard would be one of the Elect and if X(Maynard)
As Yogi Berra said, “Predicting is hard, especially about the future.” In making making a prediction about Maynard’s future, using only the value of X(Maynard), a procedure would be to simply let this magic number C be the midpoint between mean(X|E) and mean(X|R), respectively, and predicting using the method described above, with the realization that you are going to be wrong some of the time.
This is the same as saying predicting Maynard to be in the group E if X(Maynard) is closer to the mean of X on E than it is to the mean of X on R and to be in R otherwise.

Some More Math

Recall that the folks at the Moonshine Academy had two random variables they kept on each of their students: beard length and number of missing teething.  Mathematicians like to abstract, so we can think of each of the students at MA as a point in space and the values of the random variables coordinates for that point.  We can then plot them out on an X-Y axis.  The results might look like those below:
As you see here, the data separates into two groups.  There is one in the lower left hand corner that appears to be centered at about (1,1) and another in the upper right that appears to be centered at about (2,2).
Notice that these two groups are more clearly separated on the 2-dimensional plane than either would be if we simply ignored one of the random variables.   Ignoring the vertical component, for example, would cause collisions between quite a few members of the two groups and there would be considerably more overlap between the 1-dimensional projections of the two groups.
If only there were some way to rotate the picture so that the line connecting the centers of these two groups so that it aligned with the horizontal.

Even More Math

There is, in fact, a way to perform such a rotation. One could apply a rotation matrix.  That’s right, matrices have uses beyond just things to be row-reduced to solve linear systems of equation.  Indeed, there real purpose is that of geometric transformation.  If you multiply a line by a matrix (by which I mean each point on that line), the result will be either a line or a single point.  If the determinant of that matrix is not zero, then the result will always be a line.  What is more, if you multiply a parallelogram of area A by a matrix of determinant D, you will get a parallelogram of area DxA.  And that’s what determinants are all about. Pretty cool if you ask me.
In this particular case, we probably could get away with multiplying by a rotation matrix that we pick by inspection.  However, there is a standard procedure, which we will describe in excruciating detail later, wherein we can obtain the matrix.  At this point, let’s pretend we’ve gone gone through it and obtained a matrix M.  We then take the a point with coordinates (x1,x2) and put it in the row matrix Y=[x1, x2].  We can then obtain the coordinates of the transformed point as YxM, using matrix multiplication.   Applying this to every point gives the picture below:
Note that these two blobs are fairly well separated. One could simply forget about the vertical component  and project the points onto the horizontal axis in order to go through the procedure described at the first of this article.
I like to think of the groups in the above picture as spongy clouds.  They have centers that you can obtain simply by taking the averages of all the of coordinates of each.  The centers of the groups above are exactly the same as the rotated versions of the  centers of the groups in the previous picture.

The Process from Thirty-thousand Feet

I am now going to give you the big picture by telling a few lies.  Later in the article, if you are still awake, I will provide as many details as I can.
Suppose that a new student comes to seek admission to the Moonshine Academy.  The folks at the Admissions Office will measure his beard and count the number of teeth his as missing. (Sometimes they will count the number of teeth he has and subtract that from 32 because it is easier, but I digress.)   They will then measure the distance between that point and the centers of the two groups.  The student will be associated with the group to whose center he is closer.  That is if he is closer to the center of the Elect group he will be predicted to be Elect; if he is closer to the center of the Revenuer group he will be predicted to be a Revenuer.
The word predicted is very important in the paragraph above.  Predictions can be wrong.  Indeed, even when you use a procedure on data where you know the answers, some of the predictions will be wrong.  For example, for the data above we get the following table:


This says that my model predicted 801 (799 + 2) be in the Elect; of these 799 were but two actually flunked out and became Revenuers.  It predicted 199 (1+198) to become Revenuers.  The model was right in 997 (799*+198*) of its predictions for its founding data. (That is why those numbers are marked with a *.)   These are incredibly good results and this will certainly not be the case with all data sets.

What Happened to the Rotation?

You might have noticed that I’d made a big deal about rotating the matrices to better see how nicely separated the groups were along the horizontal axis, but when I described the process the groups weren’t actually rotated.  This is because you don’t have to rotate for the test. The rotation does not change the distances.  What would be the closest center in the rotated picture would be closest in the unrotated picture.
However, once the initial test is run and you want to make predictions about candidates for next year it is more convenient to have a formula.  For the data above the formula would be: f(bl,mt)=0.982*bl+0.188*mt. Here, bl=beard length and mt=missing teeth.  It is easier to determine whether f(bl,mt)>2.330 than it would be to calculate the distance to each of the centers of the groups.  
And, as a student who would want to look good with respect to these standards, it is easier to tell that you’d be best advised to spend your time growing your beard as opposed to pulling your teeth.

How Do We Get That Formula?

To get to that formula, we must enter into the jungle of matrices where many enter and few emerge unchanged.
The first matrix I want to introduce you to is the covariant matrix. There is far more to know about it than what I am going to tell you.  This is how you get a covariant matrix.  Let X be a matrix each of whose columns are values of a random variable.  So column one will contain the beard length of each of our students and column two will contain the number of missing teeth of the same student. If you have more random variables to study, you will have more columns.  For us, with the beard lengths and the missing teeth of 1000 students, we have X as a 1000x2 matrix.
Now, calculate the mean of each column and subtract that number from each of the columns.  Call this new matrix Y. So each element of Y will be the difference between the value of a particular random value for a particular student and the mean of that random variable for all the students.  Now the covariant matrix of X, cov(X), is transpose(Y)*Y.  For us it will be a 2x2 matrix.  Note that cov(X) is symmetric; it is important.
Now, forget about this particular matrix because we are not going to use it, but remember how we got it.
Let XE be the data for the Elect and let XR be the data for the Revenuers. Let pE be the proportion of the data that is from the Elect and pR be the proportion from the Revenuers.  Let Sw=pE*cov(XE)+pR*cov(XR). Note that Sw is symmetric; it is important.
Now remember that first matrix X, the one I told you to forget about.  We are going to fix it.  Instead of being the value of the random variable minus the mean whole column, let it mean of the column for the data group minus the mean of the whole column.  That is to say if the student from row i is in the Elect, his entry in the Beard Length column will be the average beard length for the Elect minus the average beard length for everybody.  Then Sb=cov(X).  Note that Sb is a symmetric matrix; this is important.
Now let M=inverse(Sw)*Sb.  Inverse(Sw) is symmetric so M is symmetric.  This is important. Write it on your hand with a sharpie.
We are about to digress.

Eigen What?

A discussion of eigenvalues and eigenvectors would take us too far afield.  One or more of us would be dead by the time I finished.  Suffice it to say that the theory is beautiful in the same way the Arctic is: The beauty is real, but seeing it face to face comes at a cost.
It is enough for us to know the following. If M is a symmetric matrix, and it is, then there is a matrix P and a diagonal matrix D such that M=transpose(P)*D*P.  Furthermore, the matrix P will consist of the eigenvectors of M, whatever the hell they are, and the diagonal entries of D will be the eigenvalues of M, whatever the hell they are.  Take the eigenvector corresponding to the eigenvalue of largest absolute value and multiply it by the coordinates of the student, and you have your formula.  For us that is [bl , mt] * transpose([0.982, 0.188]).  To get the critical point C, one applies this formula to the midpoints of each of the groups, and takes the average of the two.


“Linear Discriminant Analysis-A Brief Tutorial,”  S. Balakrishnama and A. Ganapathiraju, Institute for Signal and Information Processing

Discriminant Analysis, William. R. Kleka, A Sage University Paper, Series: Quantitative Applications in the Social Sciences, 1980

Friday, January 23, 2015


Khan Academy does a nice job of explaining ANOVA at this link.  This is in fact where I learned it.  Below I have a nice application of ANOVA using M&Ms that I would like to share.

There are numerous tables below which can be glided over without any loss of understanding.  Indeed, if you just read the prose in between the tables you will be far better off.

In the Fall of 2014, I assigned a series of activities to my Elementary Statistics involving M&Ms.  These activities begin here. There were six groups of students involved and each group took a sample of ten bags of M&Ms.  These samples are listed below.

Group 1
Color/Bag 1 2 3 4 5 6 7 8 9 10
Red 2 5 2 4 3 7 3 1 7 5
Orange 7 3 6 5 6 7 5 7 6 9
Yellow 2 3 0 1 2 0 2 4 3 4
Green 3 0 6 6 3 2 1 3 4 2
Blue 3 5 4 2 4 7 4 5 4 5
Brown 1 1 1 0 1 3 1 1 1 3

Group 2
Color/Bag 1 2 3 4 5 6 7 8 9 10
Red 2 2 3 2 2 3 1 6 3 3
Orange 3 4 0 4 4 4 3 2 2 3
Yellow 2 5 5 1 1 1 0 1 2 4
Green 3 1 5 4 4 3 4 2 4 2
Blue 3 2 2 4 3 3 6 1 4 4
Brown 3 2 1 1 2 2 2 4 1 1
Group 3
Color/bag 1 2 3 4 5 6 7 8 9 10
Red 1 1 1 0 2 3 0 4 0 0
Orange 2 0 1 2 5 6 5 3 1 0
Yellow 1 1 2 0 3 5 0 7 1 3
Green 1 3 1 2 2 1 5 4 1 1
Blue 2 2 4 2 4 5 5 2 2 1
Brown 1 0 0 1 2 0 3 1 2 1
Group 4
Color/Bag 1 2 3 4 5 6 7 8 9 10
Red 4 5 1 3 1 0 2 2 3 2
Orange 1 1 7 3 3 6 3 5 4 6
Yellow 1 4 6 0 3 2 3 0 3 4
Green 2 3 1 8 4 4 5 6 1 3
Blue 5 1 1 5 4 2 3 3 3 2
Brown 2 3 4 1 3 3 2 1 2 0
Group 5
Color/Bag 1 2 3 4 5 6 7 8 9 10
Red 2 3 1 5 5 2 0 1 5 5
Orange 2 0 1 3 2 3 3 1 5 3
Yellow 6 2 5 2 3 4 7 7 1 5
Green 2 5 5 7 1 2 4 6 5 1
Blue 1 5 6 2 4 5 4 1 2 3
Brown 3 6 0 1 3 2 3 2 1 1
Group 6
Color/Bag 1 2 3 4 5 6 7 8 9 10
Red 1 2 2 4 3 3 1 2 1 7
Orange 3 3 1 1 3 1 3 2 1 2
Yellow 3 5 3 3 2 2 3 6 4 4
Green 3 1 7 3 3 6 2 2 4 2
Blue 4 4 1 4 3 3 3 4 4 3
Brown 2 1 4 1 4 2 4 2 5 1
This is a nice collection of real data and my thought was to make the most of it.  As a sample size of ten is small, my thought was to pool the data, but before this can be legitimately done, it must be justified.  One might argue that since all of the samples were taken from M&M's that might be justification enough, but I had lingering doubts.  What if proportion of M&M color is not consistent from batch to batch? What if M&Ms are put out in a variety of Fun Sizes?  What if my students had just royally goofed?  In order to be careful, I decided that after having taught elementary statistics for twenty years it was time to learn ANOVA.

I first wanted to get an good confidence interval for the average number of M&Ms per bag. I calculated that using the data for each group, finding the bag by bag total.  I put that into the following table:

Bag/Group 1 2 3 4 5 6
1 18 16 8 15 16 16
2 17 16 7 17 21 16
3 19 16 9 20 18 18
4 18 16 7 20 20 16
5 19 16 18 18 18 18
6 26 16 20 17 18 17
7 16 16 18 18 21 16
8 21 16 21 17 18 18
9 25 16 7 16 19 19
10 28 17 6 17 18 19
I then calculated the mean for each of the groups individually and the grand mean of the total pooled data.  Using this, I calculated the sum of the squares for differences within each of the groups and the sum of the squares for differences between the groups.  Those calculations are in the table below:

Bag/Group 1 2 3 4 5 6
1 18 16 8 15 16 16
2 17 16 7 17 21 16
3 19 16 9 20 18 18
4 18 16 7 20 20 16
5 19 16 18 18 18 18
6 26 16 20 17 18 17
7 16 16 18 18 21 16
8 21 16 21 17 18 18
9 25 16 7 16 19 19
10 28 17 6 17 18 19
GrandMean Means
17.07 20.7 16.1 12.1 17.5 18.7 17.3

1 2 3 4 5 6
7.29 0.01 16.81 6.25 7.29 1.69
13.69 0.01 26.01 0.25 5.29 1.69
2.89 0.01 9.61 6.25 0.49 0.49
7.29 0.01 26.01 6.25 1.69 1.69
2.89 0.01 34.81 0.25 0.49 0.49
28.09 0.01 62.41 0.25 0.49 0.09
22.09 0.01 34.81 0.25 5.29 1.69
0.09 0.01 79.21 0.25 0.49 0.49
18.49 0.01 26.01 2.25 0.09 2.89
53.29 0.81 37.21 0.25 0.49 2.89
156.10 0.90 352.90 22.50 22.10 14.10

1 2 3 4 5 6
13.20 0.93 24.67 0.19 2.67 0.05
13.20 0.93 24.67 0.19 2.67 0.05
13.20 0.93 24.67 0.19 2.67 0.05
13.20 0.93 24.67 0.19 2.67 0.05
13.20 0.93 24.67 0.19 2.67 0.05
13.20 0.93 24.67 0.19 2.67 0.05
13.20 0.93 24.67 0.19 2.67 0.05
13.20 0.93 24.67 0.19 2.67 0.05
13.20 0.93 24.67 0.19 2.67 0.05
13.20 0.93 24.67 0.19 2.67 0.05
132.01 9.34 246.68 1.88 26.68 0.54
The SSW sums to 568.6 and the SSB sums to 417.1.  The numerator has m-1=5 degrees of freedom as we are comparing m=6 groups. The denominator has m*(n-1)=6*(10-1)=54 degrees of freedom as each of those groups took a sample of size n=10.  This gives an F test-statistics of F=7.92.  The critical number for those degrees of freedom with a significance level of  alpha=0.10 is 1.957.  As 7.92 is greater than 1.957, we must conclude that these samples are not all drawn from the same population.

This came as something of surprise to me.  As an educator of over 30 years experience, I immediately suspected student error.  Looking at the SSW and SSB table above, I noted that the numbers from group 3 were considerably larger than the rest.  I was curious as whether and how they had erred.  Discerning this was easy because I had had the students document their process.  In looking at the documentation from group 3, I found the follow photograph:

The student had been told to use Fun Size M&Ms.  It was assumed that they would plain and that the bags would not be mixed.  We are well tutored in how one spells ass-u-me. 

I would be remiss at this point, however, if I did not say that I had pushed this further. Elementating group 3 does not fix the problem.  The remaining groups are not sampling the same populations and an examination of the documentation of the other groups does not reveal a similar glaring error in methods.   Of all six groups, only 4 and 6 seem to be sampling the same population.

I will be having my class do a similar experiment this semester--with better instructions from the teacher--and after this I will conduct this study again.