Is protein good for you? (or, Hire a damned methodologist already)

I was perusing the Table of Contents in the Journal of Sports Science and Medicine, and noticed a study entitled, “Effects of protein supplementation on muscular performance and resting hormonal changes in college football players.”

Apart from my inherent bias against the massive attention that is paid to football players (it’s a long story), I thought it would be a good study to review. Alas, it is just another case to show why both the lay-public and trainers and coaches have become so jaded with training/nutrition studies and research in general. The reality is that a lot of research is crap (I can hardly wait to write my next opinion piece on that topic!) And more crappy research perpetuates the breakdown between research and practice because the practioners don’t think it’s worth pursuing because it’s crap and the vicious crap cycle continues. A lot of this stems from people not knowing what they don’t know.

The reality is that a randomized controlled trial is not a simple construct. I’m not a cardiologist, so if I see a patient who has a complicated heart problem, I don’t say, “Well, Mr. Smith, I took my cardiology course in medical school, and I’ve treated a few patients with simple heart problems, I can TOTALLY figure out your problem and take care of you,” I consult a cardiologist. What’s totally mind-baffling is that most researchers who have never had ANY training in randomized study design won’t hesitate to bungle their way through one. And these trials, on the whole, are designed (but not intentionally) to fail. The amount of wasted time (on the part of researchers, graduate students AND subjects) and money, and opportunity to do meaningful research is nothing short of phenomenal. It borders on the unethical. And we’re back to the crap cycle again. So, to all of you exercise physiologists…hire a damn methodologist already (I’m available…)

So, case in point: Hoffman JR et al, Effects of protein supplementation on muscular performance and resting hormonal changes in college football players. Journal of Sports Science and Medicine. 6: 85-92, 2007.

As usual, any extensive commentary by me is in [ ]’s. I can’t guarantee it’s not going to be snide.


My first impression of the title was, “My God, do we need ANOTHER protein study?” but if these authors’ review is correct, then yes, yes we do. Hoffman et al. propose that although there have been studies (many of which I am admittedly unfamiliar with, and other that I am familiar with) to show that a) untrained individuals don’t get a whole lot out of protein supplementation; b) there seems to be increased recovery with protein supplementation; c) protein may cause a decrease in cortisol levels (in untrained individuals); d) protein affects resting testosterone levels; e) protein increases protein synthesis; and f) many college athletes aren’t getting enough protein; there apparently haven’t been ANY studies looking at long-term protein supplementation on strength, power, body composition, and resting hormone levels of various hormones (long-term meaning 12 weeks). So by golly, this is what they were going to do.


Twenty one male strength and power athletes were recruited for this study. They were all athletes from the college’s football team with at least 2 years of resistance training experience. The athletes were not allowed to use any other nutritional supplements, and did not use any steroids [and the researchers determined this by giving them a questionnaire, because we all know a questionnaire that asks you whether you’ve used steroids in the past year if you’re a college athlete on scholarship in the NCAA which prohibits steroid use is most definitely a very reliable way to catch steroid use.]

[How they decided on 21 subjects is a mystery to me. Inadequate sample size is one of the CLASSIC reasons why a trial will fail. “Ten subjects in each group should be enough,” is NOT a sample size calculation. You think I’m joking when I put that in quotations, but that’s an actual quote from a researcher who shall remain nameless.]

Subjects were “randomly assigned” to either a protein supplementation (PR) or placebo group (PL). [No idea how this random assignment was carried out. At least the HIIT vs. Steady State one said they picked names out of a hat.]

Both groups got the same training program (I believe this study was done during their off-season), which was 4 days per week, 2-day split routine (upper vs. lower body, roughly, with triceps on day 1 and biceps on day 2). The actual routine isn’t that important for this study’s purpose–just that they all got the same one. Workouts were supervised by study personnel.

The subjects were tested for all outcomes at three times in the study period. Pre-, mid- and post-12 weeks. The mid-way test was done at 6 weeks. Testing involved serum total testosterone, growth hormone, IGF-I, and morning cortisol; as well as DEXA scans for percent body fat, bone mineral density, and tissue composition (bone vs. fat vs. non-bone lean tissue). All bloodwork was fasting.

In addition, all subjects underwent strength testing for their 1-RM for squat and bench press, and Wingate testing for anaerobic power (peak power, mean power, total work, rate of fatigue). Subjects also did a 3-day dietary recall every week (three day dietary recalls are pretty standard in the training research world–even though they’re notoriously unreliable. Why this is, or why they continue to be used is like asking why hot dogs come in packs of 8 and hot dog buns come in packs of 12. It just defies human logic.)

[Before I tell you all about the actual intervention, I just want to say that no where in this article is a PRIMARY outcome defined. This is also typical of training studies. They just pile on a smorgasboard of outcomes and hope that somethings falls out. I think I counted 13 outcomes. Possibly all of equal importance. Or not. When you do that many significance tests, SOMETHING is going to come up significant–whether it’s important or not. And more than likely, something is going to come up significant when it’s actually, in reality, not. The list of articles I _want_ to write in terms of opinion pieces and tutorials has now become massive.]

The protein group got protein powder packets. Each packet contained 42g of protein, 18g of carbohydrate (maltodextrin) and 3g of fat. This powder was a proprietary blend of milk protein concentrates, whey protein concentrate, L-glutamine and dried egg white. Total caloric value: 260 kcal. I don’t think JSSM has the standard editorial policy of declaring conflicts of interest or lack thereof, because there isn’t the standard statement anywhere in this article.

The placebo group got maltodextrin packets, essentially. Each packet contained 2g of protein, 63 grams of carbohydrate and 2g of fat. Total caloric value 260 kcal.

[Blinding is by far the biggest challenge with supplement studies. And unfortunately, the authors haven’t commented on any of it. They do mention that the study was done in a “double-blind” fashion, but that’s about it. We don’t know whether the two packets looked the same, whether they tasted the same, whether they had the same texture or not, or actually…who was actually blinded (double-blind just refers to two parties being blinded, but studies can have up to 4 or 5 parties involved. In this case, there are at least 3 parties–the subjects, the trainers, the person dispensing the packets, the researchers themselves, and the lab techs, who might also be the researchers, and the analysts.) So while this might have been a great placebo, it’s impossible for us to tell because they haven’t reported on it. Oh look, another tutorial topic!]

Stats (for the geeks):

The two groups were compared using repeated measures ANOVAs, with post-hoc tests in the case of a significant ANOVA. Also, pre- and post- values in each group were compared using t-tests and Pearson correlation coefficients were used to look at selected correlations. “Effect size” calculations were used to determine the magnitude of treatment effect.

[So, multiple ANOVAs and multiple t-tests. In my world, this is called “data mining”]


With respect to diet, both groups got about the same number of calories as one another. The PL group got, on average 3139 kcals (SD 300) and the PR group got, on average 3072 kcals (SD 241). The groups did differ with respect to their carb intake and their protein intake (hardly a surprise, but the authors did a significance test on it anyways to make sure it was _statistically_ different. Table 2 alone has 8 significance tests. At around 12 significance tests, there’s about a 50/50 chance that you’re going to get a p-value that is less than 0.05 when you shouldn’t–i.e. the Type I error.)

Both groups experienced statistically significant gains in their 1-RM lifts. The PL group went from a mean of 162.8 kg (SD 24.2) to 174.1 kg (SD 23.3) for their squat, while the PR group went from a mean of 158.5 (SD 38.5) to 182 (SD 38.2). This translates to a 9.1 kg improvement for the PL group (SD 11.9) and a 23.5 kg improvement for the PR group (13.6 kg)!

Bench press was not as dramatic. The PL group went from 122.7 to 131.2 kg (SD 12.2 for both time periods) and the PR group went from 120.7 (SD 21.1) to 132 (SD 22.0), translating to an 8.4 kg (SD 6.9) gain for the PL group and an 11.6 kg (SD 6.8) gain in the PR group.

In terms of the Wingate test for power and fatigue, there were negligible differences between the two groups. (Table 3 in this section has 9 significance test between groups, and 16 significance tests within groups, for a grand total of…12+9+16=37 tests so far.)

When it comes to the hormone outcomes, it starts to get really ridiculous. No difference was detected within either group with respect to testosterone levels for pre- vs. mid- and pre- vs. post- training; and no difference was detected between groups for any of the time points (5 more tests).

A significant difference in cortisol levels was not detected either, except for one time point. The PL group apparently had a statistically significant decrease in their cortisol levels between their pre- vs mid- tests; and were also found to be statistically significantly lower than the PR group at the mid-point testing. (5 more tests. With a running tally of 47 tests.)

For IGF-I and growth hormone levels, no statistically significant differences were found (another 10 tests).

[So the grand total of significance tests in this study appears to be 57–with 9 of these tests being statistically significant.]

Perhaps the saddest result was the body composition outcomes (from their Table 3, so this paragraph doesn’t actually add to the carnage of a tally). The PR group gained 1.4 kg (SD 1.9) of lean body mass, while the PL group gained 0.1 kg (SD 1.4) of lean body mass.

[Understanding why this is disappointing requires understanding a little bit about standard deviations. A standard deviation is a measure of how much variance there is around the mean. To use a mean and standard deviation to describe your data, it should be normally distributed (e.g. the classic bell curve). Since the normal distribution is a predictable distribution, by reporting a mean and standard deviation, you are saying that 66% of your data lies within one standard deviation of the mean, and thatn 96% of your data lies within two standard deviations of the mean. So, in the case of lean body mass gain, while it looks like a nice difference between the two groups, it’s really not that impressive when you think that 95% of the PR group lie between _LOSING_ 2.4 kg and gaining 4.2 kg. Since the normal distribution is symmetrical, if we divide that interval in half, we can make a rough (very rough) inference that half of the subjects in the group gained 0.9 kg or less (which includes losing lean body mass). However, what’s saddest about this result is that it could have been prevented if only the investigators had planned for this analysis ahead of time. It’s entirely possible that the result might have not have changed in terms of significance, but at least we would have more certainty around that fact. As it stands now, the 95% confidence interval around the mean is 0.12 to 2.68 kg, which means in the context of repeated experiments, 0.12 kg of lean body mass gain would be a plausible value. Not so great, huh.]

The authors state as their MAIN conclusions that:

1) athletes don’t get enough protein in their diet, but when given protein supplements, that they can do it. Which recommendation they were following isn’t reported, since there are multiple articles on recommended protein intake for athletes.

2) Protein supplementation seems to make your lower body stronger, but not your upper body. I want you to read that sentence again and agree with me that that is quite possibly one of the most absurd sentences in terms of describing a cause-effect relationship. It’s not exactly what they wrote, but they did write, “Results of upper body strength…did not support the efficacy of a 12-week protein supplementation period in experienced trained athletes.” In my opinion, there are three possible explanations why this study turned out the way it did: 1) their sample size was too small to detect a difference (i.e. the trial was doomed to fail); 2) differences in upper body strength (i.e. bench press) are difficult to detect because increases in upper body strength are proportionately smaller than increases in lower body strength (which basically requires a larger sample size, if it’s deemed to be important enough), or 3) there really is no benefit to consuming protein with regards to upper body strength and somehow there actually IS an unexplicable differential between your upper body and lower body in terms of the effects of protein (which makes the least sense of all). Oh, and 4) the “significant” result found by the researchers for the squat improvements is actually spurious, and in reality, there is the possibility that protein does squat for the squat.

As a final re-analysis, I wanted to revisit the squat result and generate that handy-dandy 95% confidence interval. So, our pooled standard deviation is 12.8 and the pooled standard error is 5.6, which means the 95% confidence interval for the difference between the two groups in terms of the increase in squat 1RM (which was 14.4 kg) is 3.3 kg to 15.5 kg. Again, in our interpretation of this interval, that means that in the context of repeated experiments, we would expect to see a PR group increase their squat by as low as 3.3 kg more than a PL group, or as high as 15.5 kg more than a PL group. That’s pretty damn imprecise, eh?

There were a variety of reasons why I picked this article to review. But I think most striking piece of crap that came out of this article was the sheer number of significance tests. This whole p

As you can see, for the alpha level of 0.05, by the time you’ve done 20 tests, the probability of at least one type I error goes up to about 0.65; and at 40, we’re really closing in on a probability of 1. So one really has to ask whether the significant findings of this study are real or not, because they did at least 57 tests. Pretty sick, huh?

The bottom line: Crap study for crap results. Doesn’t help us make a decision either way whether to continue using protein or not. I guess it’s good pilot data for a future study…

Click Here to view the Full Version of our Website