Okay, so maybe I’m still feeling aggressive because of the steroid shot I got yesterday, but I’ve got fire burning and I want to let it out! Watch out AP Statistics exam, hear me roar! I will note that I actually expect to be roundly disagreed with on this one, and I can totally accept that. The confidence interval problem I think is a real one. This one is more questionable.

Last year, the AP Statistics exam had a question that I honestly think/thought is awesome. Zuties!

Everybody liked this problem, but I don’t think the scores were good. Part (a) required a chi-squared test of association/independence for full credit (3 out of 4 points) which was not necessarily obvious to all students. So my argument here is with “convincing statistical evidence”. Apparently in AP Stats world, that has become secret code for “do a test” but I would like to argue that is not a good thing. This sort of universal “testing is the only way!” argument is what has led to the over-use of p-values as a measure of success in scientific journals, the rise of p-hacking, and in general the lack of understanding of the nuances of statistical arguments in scientific papers, not to mention contributing to the problem that people reading the articles (e.g untrained journalists) not really understanding the results and sensationalizing them. THERE ARE OTHER WAYS.

For example, consider this bar graph:

That is a lovely bar graph. And from that bar graph it is *incredibly obvious* that the apple ad group is REALLY different from the other two. I don’t need to run a test to know that. It’s clear. We spend a lot of time in early chapters talking about interpreting these bar graphs, and this is the kind of thing we ask: “Is there a relationship between type of ad and Zuties chosen?” is an easily, comfortably answerable question here without running a test. And I think it is statistically convincing. I am convinced by these statistics, and I think anybody would be.

We should only need to run a test in situations where the difference is not obvious. If the middle one were 18 chocolate and 7 apple, then I’d say “hmm. It’s hard to tell if they are different enough to matter..” and a hypothesis test would be needed. But this one is obvious. And I know I’m right, because if you DO run the test the p-value is, I kid you not, *0.006*. That is INCREDIBLY tiny in AP Stat terms. This difference is HUGE. Saying that we needed a test to confirm that is the height of p-value worship and distrust of reasonable interpretation.

Moreover, if a qualitative answer is allowed and used for the first question, then the second question becomes more obvious. the choco-zuties ad didn’t really change any minds relative to no ad, so there’s no real reason to run choco-zuties commercials, but the apple-zuties ad clearly changes things. That is a qualititative answer, based on the bar graph, and IS what the second question wanted. All that work to get a p-value just to confirm something obvious that doesn’t actually help answer the most interesting question about this data. In fact, that second question makes me think that a much more reasonable question in the first place was “Do the apple ads or chocolate ads seem to differ from the group that received no ads?” which could be quantitatively answered by two separate 2-prop-z-tests (or chi-square tests with one degree of freedom – or confidence intervals!) one of which would have a very high p-value (chocolate is close to none – honestly, too obvious to need a test) and one of which would have a very low one (apple very different from none – honestly, too obvious to need a test) and would have provided more useful quantitative evidence for this last question anyway.

Look, if the AP exam wants to make sure students run hypothesis tests, that is totally fine. But phrasing things as “convincing statistical evidence” when what you really mean is “run a hypothesis test” is NOT fair. There are many types of convincing statistical evidence. Most people are far more likely to be convinced by that bar graph than a p-value, and since the bar graph is not at all deceiving, that’s fine. My message to the AP exam is simply this: ask the question you mean. Say “Use an appropriate hypothesis test” if that’s what you want students to do. Or in a problem like this, just make the answer less obvious, so a test actually feels necessary. Because pretending that hypothesis tests are the only reasonably convincing statistical evidence available is, in my opinion, false.

I agree with you here. From an assessment point of view, however, I do think it’s nice that the AP gives scenarios that are really obviously significant (or not). In general, though, this question was not my favorite, by any stretch. The language was a little confusing and there was a lot of debate (even among the teachers and readers) about what could and should be communicated.