Beta-alanine vs. um…stuff. No one really wins.

Ok, I’m behind the times. By about 3 months. But you know, it’s ’cause I have to go searching for this stuff all by myself! Maybe we should consider this a bit of catch-up, since the blog was started only a month ago. If you see a good study, or mention of one, send me the freaking reference! Some “cutting edge” blogger I turned out to be, eh…

I picked this paper out after listening to the FitCast (yes, thank you again, Kevin) and hearing about beta-alanine. Of course, doing a quick search on JPFitness revealed that people were talking about it in December, and that not only had it discussed on T-Nation already, with an article by Dr. Stout, I had inadvertently picked Dr. Stout’s paper for this week’s review. It’s always risky to criticize a study that has already received fairly rave reviews, but…here goes.

Stout JR, Cramer JT, Mielke M, O’Kroy J, Toro DJ, Zoeller RF. Effects of twenty-eight days of beta-alanine and creatine monohydrate supplementation on the physical working capacity at neuromuscular fatigue threshold. Journal of Strength and Conditioning Research, 20(4):923-931, 2006.


Beta-alanine has been shown to increase carnosine levels in the muscle. Increased levels of carnosine have been suggested to be associated with buffering action and may assist in decreasing “neuromuscular fatigue” by buffering hydrogen ions that are produced during muscular work. Creatine has also been shown to decrease neuromuscular fatigue, though this aspect of creatine benefit has been less studied than its other effects. So, the researchers decided to look at using both beta-alanine and creatine to look at whether there would be a combined effect. To this end, they used an incremental cycle ergometer test called the physical working capacity fatigue threshold (PWCft).

[Now, a note about the PWCft. Admittedly, I did not have the time to go and dig into the journal, Ergonomics, to find the original studies where the PWCft test was developed–mostly because it would have involved hoofing it to the library to grab the hard-cover bound copies of Ergonomics, because Ergonomics only goes back to 1996 in our electronic collection here.

BUT, according to Stout et al, the PWCft is a test that “…utilizes the relationship between EMG amplitude and fatigue during submaximal cycle ergometry to identify the power output that corresponds to the onset of neuromuscular fatigue. The PWCft represents the highest power output that results in a non-significant (p>0.05) increase in muscle activation of the vastus lateralis over time.”

So, I have three immediate comments about this study before I’ve even read the methods section. Again, I will emphasize the fact that I have not gone back to read about the development of the PWCft, so the comment here is based solely on the authors’ description of what it is:

1) It seems that PWCft is basically a probabalistic test, and operates under a fundamental fallacy that a p-value that is greater than 0.05 is equivocal to implying that “no difference” exists in muscle activation after a certain power output has been achieved, and thus, assigns that power output to mean that the neural component to force generation has tapered off, and calls that “neuromuscular fatigue”. However, even in the face of a correct interpretation of a p-value greater than 0.05, which would be, “there is insufficient evidence to show that a difference exists,” (which would not necessarily mean that one doesn’t exist–we just can’t tell), we’re still looking at using a non-significant p-value in a series of repeated significance tests to delineate not-fatigued from fatigued. Now we know the error rate on a single significance test is 5% in terms of finding a significant difference when one does not truly exist, but what are the chances that we’re going to make the opposite error, the Type II error, which is not finding a difference when one DOES truly exist? And that has more to do with power, or the beta level, than it does the alpha level. Perhaps I will discuss the beta level later this week. Either way, using a non-significant p-value to determine anything is dubious at best.

2) We’re looking at surface EMG here. Now, I understand that the PWCft is validated, and reliable and sensitive to change (all important components to validity), but I find myself skeptical about it from the get-go, knowing that surface EMG is incredibly imprecise and can be affected massively by very small factors such as electrode placement, skin moisture, atmospheric moisture, subcutaneous body fat, and luck. For the most part, there seems to be a lot of work still being done to investigate processing methods to extrapolate more information than “muscle is on, muscle is off” from surface EMG data. I’m not an EMG expert though, so if this was a real peer-review, I would either defer to one, or consult one.

3) This is NOT, I repeat, NOT a study on weightlifting. It’s a study on cycling. So before you get all excited about it, I’ll ask you: If there was a study showing that six sets of 30 seccond all-out intervals of cycling made your legs stronger, would you start doing 6 sets of 30 second all-out intervals of squats? (The answer is, maybe, but not based on a running study!) IF we assume that the PWCft is a valid measure of fatigue, we are still looking at fatigue over time during submaximal cycling (i.e. a test where you sit on a bike and pedal with increasing resistance until you can’t maintain a pre-set RPM), not periodic lifting (i.e. a situation in which you go through a short period of activity with a constant load, rest, and do another short period of activity with a constant load, which may or may not be different than the last one). There is a certain amount of “fatigue is fatigue” to this, but fatigue is also contextual–and in particular “neuromuscular” fatigue–whatever _that_ means (which could be influenced by things like neurotransmitter reuptake, vesicle release, kinetics in the synaptic cleft–all of which would be HUGELY affected by the fact that you take a minute of rest between sets).

So based on these issues, before I even delve into the guts of the study itself, I would be leery of starting beta-alanine supplementation based solely on this study for improvement in lifting performance. Especially if it’s expensive and I’m poor. However, if it’s cheap and I can afford it, I _might_ try it, but that decision would not be based on anything but anecdotal hype–which, so long as you know that’s what your decision is based on, is fine. Hey, I’m just as desperate as the next guy.

And yes, the opening square bracket of this comment is waaaaay the hell up there, isn’t it?]


As a general comment, the reporting of this trial was actually quite sub-par. There was a lot of detail missing.

There was no mention of inclusion or exclusion criteria. The authors mentioned that none of the subjects had ingested creatine or any other dietary supplement for at least 12 weeks before the start of the study. What we don’t know are things like training history, where the subjects were recruited from, how many subjects were considered but excluded, why subjects were excluded if they were excluded, etc. We don’t know who this study was supposed to target, and so making statements of generalizability (i.e. the population to which we are supposed to most closely extend these results to) is difficult.

The method by which the randomization sequence was generated was not discussed, nor was randomziation concealment, who had access to the code, what the randomziation scheme itself was, and whether there were any stratifying factors.

The authors stated that the subjects were blinded because they got identical tasting and appearing supplements, and that they were randomized in a “double-blind” manner. Who the other blinded party was is not known (trainer? evaluator? investigator? supplier?)

Treatment groups:
There were four arms to this trial: 1) the placebo group (34g dextrose), 2) the creatine group (34g dextrose and 5.25g creatine), 3) the beta alanine group (34g dextrose, 1.6g beta alanine), 4) the combined group (34g dextrose, 5.25g creatine and 1.6g beta alanine). The groups took their assigned supplement dissolved in 16oz of water, 4 times per day for the first 6 days and then twice per day for the remaining 22 days.

[This doesn’t happen very often, so it’s probably not that important, but the weight of the supplement in each of these groups is different. There have been trials that have failed because of an oversight such as weight, and subjects unblinding themselves–or at least figuring out they’re not on the active treatment arm. Mostly, this comment is a teaching point.]

The EMG electrode was placed on the vastis lateralis, in a standardized fashion (shave the skin, scrape it with sandpaper, make a standard measurement–in this case, midway between the greater trochanter and the lateral femoral condyle, and stick the electrode on. The PWCft was determined as described above. Subjects were asked to pedal at 60 W, and 70 rpm. The power output was increased every 2 minutes by 30 W until the subject couldn’t maintain 70 rpms. The authors also did a spot sample of 12 subjects to test the reliability of the PWCft. They claim to have calculated the intraclass correlation coefficient, but called it “r”, which is usually the symbol for a Pearson correlation coefficient, not an ICC (whose symbol is generally ICC). There’s a reference to a study by Weir, which describes how to calculate r, but I haven’t gotten a copy of it yet. Just for the record, an ICC is VERY different from an r. So, maybe it’s an ICC, or maybe it’s a Pearson correlation coefficient. When I get the Weir study, I’ll find out. The major issue here is that an ICC is generally a good tool to determine test-retest reliability, while a simple correlation coefficient is completely inappropriate.

The authors performed two ANCOVAs with the pretest PWCft as the covariate. [An ANCOVA is an analysis of co-variance. You can use it like an ANOVA, but it’s useful when you want to control for an extra variable–in this case, pre-test PWCft.] It’s not entirely clear why they did two, because I only see one ANCOVA described in this section. The authors also described using a least squares regression to examine the linearity of the relationship between the pretest PWCft and the post-test PWCft within each of the treatment groups. In the case of a significant p-value from an ANCOVA, they used Bonferroni-corrected post-hoc tests to suss out where the differences were. [I really wish they had just told us WHICH post-hoc test they used, since I can think of 3 off the top of my head.] They also used a partial eta squared to calculated effect size.

[This is one of those statistical sections that makes very little sense to me, because I’m not aware of how you use regression analysis to determine linearity, because linearity is a REQUIREMENT of least-squares regression usually. I have a suspicion that what this means is that they fiddled with the regression equation with different transformations of the predictor variables, until they got the highest R^2 value, but I’m not sure. If that’s what they did, then that’s also totally inappropriate too, because the highest R^2 value doesn’t imply that the relationship is linear either. The reference for two of their statistical methods was the manual for SPSS, which is a statistical analysis package. I’m not saying that this analysis was necessarily inappropriate, and I’m completely willing to concede that it’s possible that it’s just way over my head, but this is one the most confusing descriptions of statistical methods I’ve read in a paper, and gut-feeling-wise, is reminiscent of some pretty random stats.]

Results (finally):

Fifty-one subjects were recruited for this study. Thirteen in the placebo group, 12 in the creatine group, 12 in the beta alanine group, and 14 in the combined group. Presumably no one dropped out of the study or was lost to follow-up, but there are no “n’s” or sample sizes published in the statistical reporting, and no report on whether there were any drop-outs or losses to follow-up, so it’s hard to tell.

Results were reported as means with standard errors.

[Unfortunately, a standard error is not a very useful measure of variance, because it’s more a measure of precision (the standard error is derived by dividing the standard deviation by the square root of the sample size–in other words, the standard error is roughly the standard deviation adjusted per person). Equally unfortunately, is the lack of a published sample size in each group, because it’s impossible to calculated the standard deviation from the standard error without the sample size, so I really have no good way to tell you much about this data or these results. And I refuse to make the assumption that there were no drop-outs to calculate the standard deviation using the original sample sizes. Why they chose to report the data this way is completely mind-boggling to me, and only re-enforces my notion that there were some pretty random decisions made with respect to the statistical analysis.]

I’m sorely tempted to stop the review here, because telling you the rest of the results can only be done in a pretty tainted kind of way. There’s a lot of random reported things in the results section that, again, are either WAY over my head, or just downright inappropriate. There is no good reason why the analysis was as complicated as it was reported to be.

So, taintedly, here are what I think are the important results–and I’ll just spare you from the rest of this mess. Keep in mind that reporting these results in this way, goes against every statistical fibre in my body and that in NO WAY SHAPE OR FORM, should you attempt to take these results as strong, or even moderate evidence for anything.

The primary outcome of this study was the PWCft.

The placebo group went from a mean of 215.8 to 211.2 W. Total mean change: -4.6 W

The creatine group went from a mean of 172.5 to 183.8 W. Total mean change: 11.3 W

The beta-alanine group went from a mean of 170 to 198.8 W. Total mean change: 28.8 W

The combined group went from a mean of 190.7 to 214.3 W. Total mean change: 24.4 W

Notice that I did not report the standard errors here. I will just say that given that the standard errors were in the ballpark of between 11 and 23, that the standard deviations would have been HUGE if the sample sizes were anywhere close to the sizes that they were at the start of the trial.

Nonetheless, significant differences were detected between the beta-alanine group and placebo group; and between the combined group and the placebo group. No significant difference was detected (which does NOT mean that there was no difference) between the creatine group and the placebo group; or between the creatine group and the beta-alanine group; or between the creatine and combined group.

Lastly, I think it’s worth mentioning that interpreting what a PWCft actually MEANS is very difficult, particularly in the context of comparisons between groups. What is a practically important difference (see my entry of Different Kinds of Important) in PWCft? Is 28.8 W a difference that reflects a PRACTICAL improvement in performance? That’s a questions that I’m really at a loss to answer and would also defer to someone more qualified.

The bottom line:

So the bottom line, if we take the data at its face value (which I am VEHEMENTLY opposed to doing, but I know people will, so I may as well interpret it with this caveat of EXTREME CAUTION)–and keep in mind that we’re NOT looking at weightlifting trial here, so it all applies to “neuromuscular fatigue” in the context of submaximal cycling.

1) There is some evidence to demonstrate that beta-alanine, with or without creatine, is better than a placebo in decreasing “neuromuscular fatigue”.
2) There is no evidence to demonstrate that beta-alanine is better than creatine in decreasing “neuromuscular fatigue”.
3) There is no evidence to demonstrate that there is any combined effect to taking beta-alanine and creatine together in decreasing “neuromuscular fatigue”.

So, the best case scenario is that I’m a doofus, and that the data can be taken at face value as I’ve told you above. The real-case scenario is that this study (given the way it was reported) was a dog’s breakfast of statistics, and that there were MASSIVE GAPING holes in reporting, such that comprehensive and concise interpretation of the results is impossible; AND there’s the additional problem of whether the PWCft is a valid outcome or not and whether the observed changes in PWCft were meaningful or not. Either way, I would say that if you’re already taking creatine, then keep doing it because creatine is well studied and its effects are well documented with some pretty good and robust studies out there. And if you’re thinking of taking beta-alanine instead of creatine, consider taking creatine instead, because if this study is the reason why you’re thinking of beta-alanine, you’d be making an unsupported faith decision. And if you’re thinking of taking both, think again, and consider just taking creatine because the best-case scenario indicates that there is no evidence to support taking both.

And if you’re considering the real-case scenario, you’d need a better study.

All in all, a very disappointing paper.

Click Here to view the Full Version of our Website