Waiting for Tilapia.
Waiting for Tilapia could be the title of a new indie film about the perils of climate change. But no, I was literally waiting for fish. I found myself staring aimlessly at the Tilapia in my local Publix(R) supermarket. The fresh fish is located behind a glass shield which requires a Publix employee to retrieve said fish. Don't get me wrong, I understand the need to isolate the fish from the customers. It would be undesirable to have customers squeezing and handling raw fish the same way they do oranges. My lamenting is due to the fact that I had been waiting for several minutes with no Publix employee in sight. The seafood section was unmanned and the people who are responsible for attending the seafood were cleverly located behind a door marked with the words "Publix Associates Only". After a few minutes, I gave up and opted to use chicken, instead. However, during that wait I had a great idea for this article.
While waiting, I bemoaned for the 1,000th time that we do not have Wegmans in Central Florida. If you are lucky enough to live in the vicinity of Wegmans supermarkets, then you are experiencing grocery store nirvana. I travel to upstate New York regularly and have grown accustomed to the fresh food bar, Chinese buffet, prepared vegetables and main courses, and a cheese selection that is rivaled only by a Whole Foods. In my opinion, there is a drastic difference in the selection and quality between a Publix and a Wegmans. But does this come at a cost? Does Wegmans charge more for their food to provide their superior shopping experience? Since I had recently had a question about the Paired t-Test, I decided to use the Paired t-Test to determine if Wegmans was truly more expensive.
The Paired t-Test can be used when your data comes in logical pairs. A statistician might say that you "may be able to use a paired t-Test if you have additional information about each sample". However, that definition is unhelpful to the average practitioner, so let me clarify. The classic use of the Paired t-Test is to evaluate the before and after of some treatment. For example, measure the blood pressure of patient A, give them something (pharmaceutical, exercise, Tilapia) to reduce their blood pressure, then measure the blood pressure of patient A again. Repeat for patients B, C, D, ... In this case, the data of "Before" and "After" are paired by patient.
| Systolic Blood Pressure Readings | |||
| Patient | Before | After | |
| A | 170 | 165 | Pair #1 (Patient A Before and After) |
| B | 110 | 112 | Pair #2 (Patient B Before and After) |
| C | 134 | 131 | Pair #3 (Patient C Before and After) |
Some people might think that Patient A benefitted, as their "After" pressure is lower than "Before". Not necessarily - it is entirely possible that the "After" pressure is the result of random variation. That is, we could have done nothing and there would still be a 5 point decrease in Blood Pressure. More on this later. The Before and After example is a great way to introduce the concept of pairs; however, there are many more applications of a paired t-Test than before and after testing. Generally, you can use a Paired t-Test when the following conditions are met.
1. A data point from the first group can only be paired with a data point from the second group. Example: the 170 for Patient A "Before" should only be paired with the 165 for Patient A "After". It wouldn't be logical to pair the before data point with 112 or 131 as those were for different patients.
2. You must have exactly the same number of observations for the first and second groups. If you have 31 observations for Group A and 30 observations for Group B, that isn't close enough. It must be exactly the same. This should make sense, as there are "pairs"; if each group is a pair, how could the number of points be different?
3. If the data was taken as random samples, you can't use the Paired t-Test even if the there is a pairing factor. For example, if we measured the blood pressure of the patients two times (without giving them the medicine) then the Paired t-Test loses meaning.
A common question is why we should use the Paired t-Test instead of the more common (unpaired) t-Test. The "normal" two sample t-Test doesn't make the assumption of pairing, so what value do we get from collecting the data in "pairs"? The statistical answer is that the Paired t-Test has more power than the normal t-Test. That is a technical way of saying that the paired t-Test can help us detect differences that the (unpaired) t-Test may miss. This is particularly true if there are outliers in the data or if the data set has a lot of variation. To help demonstrate this, I am going to use both the Paired t-Test and (unpaired) t-Test to compare the prices at Wegmans and Publix.
If you would like to know more about the (unpaired) t-Test, read this article on Roger Clemens and Barry Bonds and alleged drug use in baseball.
My hypothesis is that I pay less for groceries at Publix than at Wegmans. Ideally, I would make a list of all the products I purchase at a supermarket and compare the price of each item at Wegmans and Publix. While this is theoretically possible, it would take a ridiculous amount of time. Fortunately, I can use a random sample and a Paired t-Test to answer the question in far less time.
Step 1 is to randomly select 30 products that I normally purchase at a supermarket. It is important to make these "pairs" and so care should be taken that they are identical. For example, it would be a bad idea to compare the price of 1 Gallon of orange juice at Wegmans to 1 Quart of OJ at Publix (we would expect one quart to be less expensive than one gallon). You can download the full list of products in an Excel Workbook; a sample of 5 of the 29 products is in the table below. Note: one of the products was not available at both Wegmans and Publix, so I removed that item from the list.
| Product | Wegmans Price | Publix Price | |
| skim milk, gallon | 1.89 | 3.55 | |
| Activa yogurt, plain, large | 2.69 | 2.39 | |
| eggs, large, Grade A, dozen | 1.29 | 2.69 | |
| Jif creamy peanut butter, 40 oz. | 4.39 | 5.87 | |
| Diet Coke, 12 pack | 3.33 | 4.99 |
Note: these prices are not fictional, they are actual prices taken on March 6th, 2011 from the Wegmans in Webster, NY, and subsequently March 8th, 2011 from the Publix in Windermere, FL.
Like all hypothesis tests, the Paired t-Test starts with two hypotheses, the null and the alternate. In the case of the paired t-Test, they are based on the difference in each pair. Specifically...
H0(Null): Mean Difference (Group A - Group B) = 0
H1(Alt): Mean Difference (Group A - Group B) not equal to 0
For our specific test, we will substitute the generic terms "Group A" and "Group B" with the actual groups which are Wegmans Price and Publix Price.
H0(Null): Mean Difference (Wegmans Price - Publix Price) = 0
H1(Alt): Mean Difference (Wegmans Price - Publix Price) not equal to 0
The difference in "Mean Difference" is the delta between each pair. This is probably easier to understand by example. In the table below, the difference for each pair is in the new column entitled "Difference". For skim milk, the difference is $1.89 - $3.55 which is $-1.66. This calculation continues for all 29 rows. The average difference is then calculated for all 29 pairs. The table below only shows the first 5 pairs; click here to download the full dataset in the file Publix and Wegmans Prices for Paired t-Test.xls.
| Product | Wegmans Price | Publix Price | Difference | ||
| skim milk, gallon | 1.89 | 3.55 | -1.66 | ||
| Activa yogurt, plain, large | 2.69 | 2.39 | 0.3 | ||
| eggs, large, Grade A, dozen | 1.29 | 2.69 | -1.4 | ||
| Jif creamy peanut butter, 40 oz. | 4.39 | 5.87 | -1.48 | ||
| Diet Coke, 12 pack | 3.33 | 4.99 | -1.66 | ||
| (24 more products...) | |||||
| Average | -.809 |
Many people would stop at this point and declare that Wegmans is less expensive since the average difference is negative (remember the difference is Wegmans - Publix so a negative difference means that Wegmans is cheaper). This would be a poor conclusion. The reason we can't jump to the conclusion that Wegmans is less expensive (this is a bad time to be in your happy place) is that we have a sample, not the entire population. If I were to compare the cost of my supermarket purchases for the duration of my lifetime at Wegmans vs. Publix then I wouldn't have to worry about all this hypothesis test stuff. That, however, is outside the realm of possibilities. Therefore, I am forced to use a sample and a sample comes with error.
A wise man once said, "Given two numbers, one will be greater". Granted, there is a very small chance that the average difference would come out to be exactly zero but more likely it will be in favor of either Wegmans or Publix. What if the difference were .01 (one penny) in favor of Wegmans? Would you be quick to say that shopping at Wegmans over the long term would save you money over Publix? Perhaps Wegmans just received a large shipment of eggs and has them discounted to move them before they go bad. Perhaps if you chose a different 29 products the difference would come out as .01 in favor of Publix. Hopefully, you see that for a very small difference our confidence that Wegmans is truly cheaper could be in error.
What if the average difference over 29 products was $100 (I realize this is absurd; I am making a point). For this to happen the Wegmans price for eggs would be $1.89 and the Publix price would be $101.89 (on average). With a difference this big, we could conclude that Wegmans is less expensive with very little chance of error.
Summary so far...
If the average difference is $-0.01 we know we have a large chance of making an error if we conclude that Wegmans is less expensive.
If the average difference is $-100 we know we have little chance of making an error if we conclude that Wegmans is less expensive.
It would be nice if we could calculate the probability of making a mistake based upon the difference (and sample size). This is where the Paired t-Test comes in. It calculates a "p value" which is exactly that - it is the probability of making a mistake. Put more formally (and I am about to slip into stats speak so I don't get any nasty grams from uptight statisticians), we can calculate the probability of making a mistake if we conclude the H1 or Alternate Hypothesis. Remembering that our hypothesis table looks like this...
H0(Null): Mean Difference (Wegmans Price - Publix Price) = 0
H1(Alt): Mean Difference (Wegmans Price - Publix Price) not equal to 0
Enough theory, let's do the math. Using Quantum XL's Paired t-Test functionality I calculated the Paired t-Test with the results below.

Quantum XL breaks the analysis into three sections, Hypothesis Tested, Results, and Dataset Statistics. The dataset statistics is just nice-to-have information, but the Hypothesis Tested and Results deserve special attention.
The Hypothesis Tested section restates the hypothesis in the event that you don't remember all the details from this article. If your datasets include titles (i.e., the words "Wegmans" and "Publix") then the hypothesis will be stated in those terms. I highly recommend using titles as it simplifies later interpretation. Once more, the null hypothesis is the mean difference is zero; the alternate is the mean difference is not zero.

The Results section has "the answer" or our P Value, our probability of making a mistake if we conclude the price difference is not zero. In our case, the P value is so close to zero that Excel rounds it to zero.

In actuality, the probability of making a mistake can never truly be zero; Excel is rounding the results. I reformatted the results and displayed the P Value to more digits with the result below. If we conclude that the population prices are different, then we only have a .0000153 (or .00153%) chance of making a mistake. To put this in perspective, we have 15 chances out of a million of making a mistake. Note: the T Value is a bit geeky, and unless you want to understand the math behind this test, we don't need to use it.

Below the P Value is some supporting information. Since we know that the difference is likely not zero, then what is it? Well, in this case we can be 95% confident that the difference is between $-1.13 and $-.49, with the estimated difference equal to $-0.81. Or, put more simply, Wegmans is less expensive by $.49 to $1.13.
If you remember, this started when I was waiting for Tilapia in my local Publix. I hypothesized that Wegmans must charge more for their superior shopping experience (in my opinion). I established my hypothesis and then collected my sample. Side note: I almost fell out of my chair when I saw that Wegmans was less expensive than Publix for these 29 items. However, I can continue my analysis to determine if this is random variation or a real difference. Using the sample, I calculated the P Value at .0000153. At this point, I must be careful about what this means. So let's go through a few options of how to state this.

Method 1: The uptight stats Nazi method. If you read medical journals for entertainment you may see similar statements.
I choose to reject the null hypothesis and conclude the alternate. There is .00153% chance that this is an error, which is below my previously stated threshold of risk of 5%.
Method 2: More friendly but still correct method
Based upon our sample of 29 products, we can conclude that the prices of Wegmans and Publix are different with a chance of error equal to .000015.
Method 3: Easiest to understand (caution: may make uptight statisticians squirm)
Based upon my sample of 29 products, I am 99.99846% confident that the prices at Wegmans and Publix are different.
The reason Method 3 causes pause is due to the swap from "probability of an error" to "percent confidence". Personally, I use this method as more people can relate to this interpretation than the other two.
Finally, I should note a common mistake that is wrong, and not just to the uptight stats Nazis.
Method 4: Commonly stated but incorrect
There is a .00153% chance that the means are equal.
What I just changed was subtle so let me ensure you caught it. Instead of expressing the percent error as "not equal" (which is the H1 or alternate hypothesis) I switched to "equal" which is the H0 or null hypothesis. We should not do this. It is one thing to calculate the probability that two means are not equal, but we can't really talk about equality. Why? Well, that is a somewhat hard concept and one of the most confusing parts of a hypothesis test. We can find evidence that the means are not normal, and express confidence that the means are not normal, but this is not the same as evidence that the means are normal.

Two-sided test
H0(Null): Mean Difference (Wegmans Price - Publix Price) = 0
H1(Alt): Mean Difference (Wegmans Price - Publix Price) not equal to 0
One-sided test Method 1 (Is Publix Less Expensive)
H0(Null): Mean Difference (Wegmans Price - Publix Price) <= 0
H1(Alt): Mean Difference (Wegmans Price - Publix Price) > 0
One-sided test Method 2 (Is Wegmans Less Expensive)
H0(Null): Mean Difference (Wegmans Price - Publix Price) >= 0
H1(Alt): Mean Difference (Wegmans Price - Publix Price) < 0
One-sided test Method 1 (Is Publix Less Expensive)
H0(Null): Mean Difference (Wegmans Price - Publix Price) <= 0
H1(Alt): Mean Difference (Wegmans Price - Publix Price) > 0

One-sided test Method 2 (Is Wegmans Less Expensive)
H0(Null): Mean Difference (Wegmans Price - Publix Price) >= 0
H1(Alt): Mean Difference (Wegmans Price - Publix Price) < 0

I am not affiliated with either the Wegmans or Publix company. I have not been paid by either party and have never worked with either company.
This analysis was specific to my shopping habits. I didn't choose 30 products at random, I chose 30 products at random from my shopping list. Why? My goal was to see if I would benefit from shopping at Wegmans (not the average consumer). A different sample from another shopping list may result in a different answer.
According to Wikipedia, Wegmans is privately owned with Danny Wegman serving as the CEO. Danny, if you are reading this please, I am begging you, extend your chain of supermarkets to Central Florida, I am tired of waiting for overpriced Tilapia.
I intended to post some pictures with the article; however, the Wegmans Consumer Affairs department would not allow pictures to be taken from within the store. This is the one head scratcher in mostly good experiences with Wegmans. It's not like they are designing stealth bombers; what about cheese and produce displays could be a secret? Perhaps their Tilapia are working with the CIA. The Wegmans Consumer Affairs also declined an interview and comment.