Today we’re going to talk about confidence intervals. Confidence intervals allow us to quantify our uncertainty, by allowing us to define a range of values for our predictions and assigning a likelihood that something falls within that range. And confidence intervals come up a lot like when you get delivery windows for packages, during elections when pollsters cite margin of errors, and we use them instinctively in everyday decisions. But confidence intervals also demonstrate the tradeoff of accuracy for precision – the greater our confidence, usually the less useful our range.

One thing that's really confusing: given a sample and then calculating a 95% confidence interval for that sample, you cannot say that there is a 95% chance that the population mean is within that confidence interval. This is because every sample has other features besides mean and standard deviation that will determine how likely it is that the mean will fall within that confidence interval. If we just use the mean and standard deviation, the 95% confidence interval is the best guess we can make for where the mean will fall 95% of the time, and indeed for 95% of samples, the mean will fall within our confidence interval. However, if we have the most advanced statistical modelling tools known to man and other features of our sample besides the mean and standard deviation, we would know the exact probability that the population mean is within our confidence interval, which is likely not exactly 95%

In a paper almost certainly prepared for a statistics class, someone found the mean length of 30 randomly selected top 100 songs from iTunes was 226.93 seconds. Seventy songs of that length will last you through a 4 hour 24 minute marathon.

But the p value will vary depending on whether you are using the .05 level (used here) or an .01 level which is more stringent.

