Different kinds of important.

On Mar 21, 2007 In Tags: alpha level, p-value, tutorial, type I error

In clinical research, there are two kinds of important–the important kind and the unimportant kind.

The topic of statistically significant vs. practically important is one that has received a lot of attention in the methodological literature, but alas, most researchers still get caught up in it. Most studies will use a cut-off point (also called the alpha level) of 0.05 to define “statistically significant”. This cut-off is entirely arbitrary, and the actual number of 0.05 is one that is used mostly out of traditional convention. Most people would never think to use anything other than an alpha level of 0.05.

But what is the alpha level anyways?

The alpha level is the probability of making a mistake that we are willing to accept. The specific mistake we would be making would be detecting a significant difference between two (or more) groups when one does not exist between those two groups in the larger population. And I’m sure that means just about d*ck-all to most of you still.

We use statistics to make inferences about the larger population–or, the _true_ population, if you will. The BEST experiment would be if we simply took every single person in the world who we were interested in studying and put them into our trial. Our error in terms of estimating the variable of interest would be 0. But we can’t do that, so we take a sample of the true population and infer the results we estimate from them to the true population. However, when we do this, there’s always the chance that we’re going to snag all (or even some of) the wrong people–you know, the guys who gain muscle just by looking at slightly heavy objects; or those ever-hated guys who don’t go on “cutting” diets, because that’s just the way they are all of the time, despite a constant consumption of donuts and coffee. We can try to minimize the changes that this will happen (which has more to do with the study logistics themselves), but despite our best efforts, this could still happen. In this case, the fact that the diet/workout program/whatever was found to work in this kind of study would be, in fact, the wrong conclusion for the true population, which includes people like you and me (well, definitely me, at least.)

So, the alpha level is the level of probability that we are prepared to accept of making this mistake (also called a type I error. Yes, there is also a type II error). Basically an alpha of 0.05 means that we’re prepared to risk a 5% probability of making a type I error. It also means that we are also only prepared to consider results as extreme or more extreme than a p-value of 0.05 as statistically important.

The decision to use 0.05 as the magic number is arbitrary as I’ve already said. Some studies, though, use smaller alpha levels, because it’s SUPER important that they don’t end up with the wrong answer. Sometimes, you’ll see alphas of 0.01.

But back to the topic at hand, what controls statistical significance?

There are three things that exert the most force on a p-value: the sample size, the difference observed between groups, and the variance within each group. There’s not a whole lot anyone can do to manipulate the difference observed, or the variance, but you can exert control over the sample size. Any study can show a p-value that is less than 0.05 with sufficient numbers, even if the difference between groups is quite small! And other times, it’s just a lucky break–biased groups can produce a difference between groups that isn’t reflective of the true population; or you can luck out and get a comparatively narrow variance (mostly by recruiting subjects that are very similar to one another–like football players from the same team). And don’t forget that there’s always at least a 5% chance that you’ll stumble on a significant p-value when no difference actually exists in the true population.

It is for this reason that research always needs to be interpreted within the practical context in which it is performed. It is possible to uncover a statistically significant difference that–even if it is a “true” one, that is totally meaningless. For instance, if you studied two groups of athletes, one of which got a weight training program and the other one which got no structured training, and found that the weight training group could lift five pounds more than the the non-training group, would ANY p-value have meaning? Probably not, unless it was a REALLY REALLY REALLY important 5 pounds (like, say, the difference between a gold and silver Olympic medal).

The moral of the story here is that a p-value provides evidence to support a theory, but that it is not the be-all and end-all of a study. Often, abstracts will only quote the p-value with no actual numbers on the observed difference. Don’t let the p-value take over your common sense. A statistically significant difference does not equate to a practically significant one.

Different kinds of important.

Archives [+]

Tag Cloud

Links