Programming for Statistics

CPM makes textbooks that live on paper, but in AP Stats we are taking the leap and assuming that most classes have access to at least a few computers for demonstration and exploration. So what does this mean?

Stats applets!

Most textbooks come with a series of statistics applets that help demonstrate concepts. For example, applets to explore correlation, or bootstrap sampling, or the binomial distribution. Most stats teachers use e-Tools from all over the internet, many of them provided by other teachers (Bob Lochel, linked there, makes some amazing Desmos tools) or other textbooks in addition to the ones they use.

One of the difficulties of writing a text for sale (even if it is by a generally well-meaning non-profit) is we can’t write up the lessons we’ve made using those applets. We have to create our own.

How? Well, one amazing option is Desmos. We have a partnership with them, and you can expect to see Desmos integration in many spots in our text, like an Activity builder version of this guy:

gifsmos

 

For scatterplot exploration with ease, Desmos is pretty great. But it’s not perfect. And there are many things (simulation, inference) that it can’t do at all.

Luckily, when looking at the amazing list of applets from the text Art of Stat, I realized they were all made using the same tool; Shiny, an open-source programming tool that lets you program interactive statistical websites in the statistical language R.

So naturally, I set about learning Shiny. And R.

It started out rough. This exploration of confidence intervals for proportions is the first thing I made. And this exploration of the sampling distribution for proportions came soon afterward. A version of the confidence interval exploration for means came next . (pictures below).

There are better-looking versions of all of these applets out there already: these really don’t bring much to the table that you can’t find elsewhere, but of course I was able to design them as I wrote the problems, so they integrate perfectly with our sequence.

Once I got better with Shiny, though, the possibilities really started to open up. My biggest project yet is my Scatterplot Generator. Originally designed to let me make black-and-white scatterplots (for our black-and-white textbook) I got very excited adding features to it. I now think it’s the easiest to use, best-featured web-based scatterplot generator I’ve seen. It calculates residuals, makes residual plots, does regression (with AP-style output), lets you change the scale, the size, the font for labels, hide labels or titles, and even use LaTeX to format the labels. Plots can be downloaded as PNG, PDF, or SVG files. Data can be entered as CSV, space, or tab-delimited (tab being what you get if you copy-and-paste from a spreadsheet). And if you turn on “Click-to-add” you can click the scatterplot OR the residual plot to create new points, and watch the plot change as soon as you do so.

scatterplot

Does it do everything? No. I plan to add color options and possibly the ability to graph more than one data set (using different colors or shapes for the other sets) eventually. But if you’re okay with black-and-white (or are advanced enough to download the SVG and change the colors yourself) then it’s a pretty powerful tool as is.

My project for the last couple of days is my Univariate Data Explorer .

univariate

It currently allows you to enter up to 5 univariate data sets as either lists of data or delimited frequency (or relative frequency) table. It will currently only create histograms (counts, relative frequency, or density and with multiple sets stacked, split, or comparative), but eventually I will add box plots and summary data, as well as the ability to graph normal curves (and possibly other stat functions) on top of the histograms. It has some color customization but not much – increasing that is on the list as well, as is adding download buttons. But again, it’s already pretty useable for histograms.

These e-Tools belong to CPM: I made them on (mostly) CPM time, using CPM equipment, and the copyright is theirs. However, the current plan is to keep them available for the community to use with no charge or constraints. I hope they are useful, and if you think of other tools you would like, let me know: I will consider making them, either on CPM time (for integration with our book), or on my own time for my own fun. Because boy oh boy is this fun! And if you like programming, and like statistics, I strongly encourage you to spend some time working through the Shiny tutorials – it is a very cool language. Make something awesome and I can host it on my shiny server (http://shiny.mtbos.org) for free.

One thought on “Programming for Statistics

  1. I am SO happy to hear the plan to keep these tools free and available to the public. I love your scatterplot generator and I’m sure I’ll be using some of your other apps as well! Thank you so much for all you do to make the stats community better!

Leave a Reply