# AP Stat Question Burning – a new series! Zuties should have accepted a qualitative response

Okay, so maybe I’m still feeling aggressive because of the steroid shot I got yesterday, but I’ve got fire burning and I want to let it out! Watch out AP Statistics exam, hear me roar! I will note that I actually expect to be roundly disagreed with on this one, and I can totally accept that. The confidence interval problem I think is a real one. This one is more questionable.

Last year, the AP Statistics exam had a question that I honestly think/thought is awesome. Zuties!

Everybody liked this problem, but I don’t think the scores were good. Part (a) required a chi-squared test of association/independence for full credit (3 out of 4 points) which was not necessarily obvious to all students. So my argument here is with “convincing statistical evidence”. Apparently in AP Stats world, that has become secret code for “do a test” but I would like to argue that is not a good thing. This sort of universal “testing is the only way!” argument is what has led to the over-use of p-values as a measure of success in scientific journals, the rise of p-hacking, and in general the lack of understanding of the nuances of statistical arguments in scientific papers, not to mention contributing to the problem that people reading the articles (e.g untrained journalists) not really understanding the results and sensationalizing them. THERE ARE OTHER WAYS.

For example, consider this bar graph:

That is a lovely bar graph. And from that bar graph it is incredibly obvious that the apple ad group is REALLY different from the other two. I don’t need to run a test to know that. It’s clear. We spend a lot of time in early chapters talking about interpreting these bar graphs, and this is the kind of thing we ask: “Is there a relationship between type of ad and Zuties chosen?” is an easily, comfortably answerable question here without running a test. And I think it is statistically convincing. I am convinced by these statistics, and I think anybody would be.

We should only need to run a test in situations where the difference is not obvious. If the middle one were 18 chocolate and 7 apple, then I’d say “hmm. It’s hard to tell if they are different enough to matter..” and a hypothesis test would be needed. But this one is obvious. And I know I’m right, because if you DO run the test the p-value is, I kid you not, 0.006. That is INCREDIBLY tiny in AP Stat terms. This difference is HUGE. Saying that we needed a test to confirm that is the height of p-value worship and distrust of reasonable interpretation.