Non-nutritive sweeteners: This is going to hurt.
This is going to hurt.
I’m publishing this blog entry quietly (ie. no social media promotion yet), because I doubt any major news people read my blog. I want to see how the media reports on this study, but also timestamp my thoughts. So if you are one of my faithful readers, please don’t share this for about 2-3 weeks (though I doubt it will take that long since The Independent has already written about it). And if someone influential is reading this, well, I guess I’ll find out.
I am both heartened and disheartened at the evolving movement of pro-evidence vs con-evidence. One the one hand, I think it’s wonderful that people are questioning everything. One the other hand, I think the vast majority of people who are doing it, don’t know what they’re actually doing. Hearts are in the right place. Minds are not. It has already cropped up sparsely on social media, with comments prefaced with things like, “I haven’t read the whole study, just the abstract.” Please. Stop. Doing. This.
This blog post is going to hurt. It is going to be long and boring. But what I want to illustrate is how far you need to go to make sense of a study like this.
So let’s get down to it:
Azad MB, Abou-Setta AM, Chauhan BF et al. Nonnutritive sweeteners and cardiometabolic health: a systematic review and meta-analysis of randomized controlled trials and prospective cohort studies. CMAJ 189:E929-39, 2017.
This study is not simple. I read it on the day it came out (because I handily am a member of the Canadian Medical Association and therefore get access to the journal). I read it, and then I had to sleep on it, because I couldn’t organize my thoughts on it in the same day.
A systematic review is a research design that is used to summarize and synthesize literature. It is an experimental design because you form questions and a protocol before your search, and you don’t really know what you’re going to get out the other end. The design is meant to minimize selection bias (ie. cherry picking) in what literature is included, as well as to encourage comprehensiveness in terms of finding every last scrap of information that is humanly feasible.
A meta-analysis is a statistical method. Not every systematic review needs to have a meta-analysis, though most do. There are arguments on both sides of why this should or shouldn’t be a mandatory part of a systematic review that I won’t get into today. The meta-analysis is a method by which data from more than one study are combined to create a combined, or pooled estimate of the effect of interest. Meta-analytical techniques are particularly useful when there are a collection of similar studies whose sample sizes are thought to have been too small to prove, as individual studies by themselves, an effect. More “recently”, meta-analytical techniques have been used to try to “resolve” conflicting results from multiple studies. Substantial discussion over whether this is appropriate also exists.
The question at play in this study was, “Is routine consumption of nonnutritive sweetness by adults and adolescents associated with adverse long-term cardiometabolic effects in randomized controlled trials and prospective cohort studies?”
The authors did not really define nonnutritive sweeteners explicitly, but mentioned that they include aspartame, sucralose and stevioside. This is important, because 80 RCTs (randomized controlled trials) were excluded as eligible due to, “..ineligible nonnutritive sweetener intervention or comparator”. You’ll see why this is important when we get to the results.
In a systematic review, you plan your protocol including your literature search before you’ve really done much of one. This is to prevent you from biasing your opinion and unconsciously changing how you search to confirm your own biases. With the help of a librarian (usually), the goal of the literature search is to be as comprehensive as possible as well as to use the power of a computer to do some of the exclusion work so that it cuts down on the amount of work that has to be done manually. This search strategy is not normally published in the main article because of publication constraints on number of pages. But it is routinely published in an online appendix. So, if you’ve read the study, and you haven’t read the appendix, then you haven’t really done your homework. Reading a search strategy is like reading programming code. It’s just sets of things being combined with AND, OR or NOT. In terms of specific sweeteners that were searched for, they looked for aspartame, neotame, aspartylphenlalanine, saccharin, benzosulfimide, sucralose, trichlorogalato-sucrose, cyclamates, goldswite, sucralyl, palatinite, xylitol, stevia, as well as their other names (including brand names), as well as bunch of other sweeteners I’ve never heard of. Looks pretty comprehensive, which makes me wonder what non-nutritive sweeteners weren’t considered eligible, or were all 80 studies excluded based on ineligible comparators?
A minimum of 6 months of study duration was required. After their search was executed, including hand-searches through references as well as “grey searches” where you basically comb as many other possibly relevant databases (including Google) for anything that might not have been indexed in the databases you ran your formal search through, they found 938 studies that needed to be hand-reviewed to decide whether they would meet inclusion and exclusion criteria. I really wish they had reported RCTs and cohorts separately, as it would give us a much more clear picture of whether they managed to capture it all.
To be included in the review, studies had to go through quality assessment. I’m happy that the authors chose to use the Cochrane Risk of Bias tool for RCTs and less happy about the use of the Newcastle-Ottawa scale for cohorts. I’m also happy that the authors published their quality assessments in the Appendix (remember that?) because it enables a reader to see if they agree with the scoring. This is important later too! In terms of quality criteria, however, no studies were excluded from the analysis based on quality. This both a strength and a weakness of the study. It’s a strength because it uses every available piece of evidence to inform the conclusion. It’s a weakness because both strong and weak studies are mathematically given the same treatment in informing this conclusion, so mathematically strong studies that are methodologically poor CAN drive the conclusion, which then leads to a conclusion based on weaker evidence. This weakness isn’t always there, but it involves looking at the results very carefully to see if it is.
I’m also very happy that the authors of the study used the PRISMA guidelines on publishing a systematic review. It really makes reviewing much easier when all of the necessary information is easily accessible.
Where this study starts to fall down is in the results.
In the two media sources I found (The Atlantic and The Australian; where The Atlantic basically quoted The Australian and didn’t say anything The Australian didn’t), the take-away message was, “Consuming food and drink containing artificial sweeteners could lead to weight gain and heighten risk of suffering from health issues including diabetes…” (http://www.theaustralian.com.au/news/latest-news/artificial-sweetners-linked-to-weight-gain/news-story/69f06f056402fd8c534bfa4612df54f8)
Before I get into the results, I want to say that I don’t think the authors have overstated their case in the paper itself. Their conclusions are cautious and fairly conservative. But I’m not convinced that even these conservative statements are well-supported by their findings and here’s why:
First off, this is really two studies balled into one manuscript. There is a meta-analysis on randomized controlled trials and a separate analysis on cohort studies (ie. studies that follow a single group of people over time whose behaviours differ to produce a variety of outcomes).
Let’s start with the randomized controlled trials. I’m not sure if I’ll get to the cohort section as it is a lot more work; and I’m not sure that it’s worth it after you’ve read this part.
The Randomized Controlled Trials- BMI
Two outcomes were reported in forest plots from randomized controlled trial data: BMI and weight change. Fortunately, there were only 7 trials included and there was quite a bit of useful summary data in the appendix too. When you look at the forest plot for BMI, there are two things to note: 1) The pooled estimate (the diamond) is driven primarily by the trial by Hsieh et al (2003). The other two trials do pull the estimate to the right, but not by much. The certainty of the pooled estimate is almost the same width as Hsieh et al; so we can say that that the conclusion that artificial sweeteners do not affect BMI are almost entirely dependent on the results form Hsieh et al.
At this point, if you want to take this topic seriously, you HAVE to look at Hsieh et al, because any flaws in that paper are going to transfer to this one. If you wanted to be super serious, you’d read all three, but because Madjd et al (2015) and Ferri et al (2006) don’t really affect the fact that Hsieh’s certainty crosses the equivalence line (adding them to the analysis still results in the certainty including the equivalence boundary), it’s less important.
So let’s look at: Hsieh MH, Chan P, Yuh-Mou S et al. Efficacy and tolerability of oral stevioside in patients with mild essential hypertension: A two-year randomized placebo-controlled study. Clin There 25: 2797-2808, 2003.
In brief, this study accepted 174 Chinese men and women between 20-75 years of age who had mild essential hypertension. They were randomly assigned to take 500mg of stevioside or placebo 3 times per day in capsule form for 2 years. The average BMI of subjects in the stevioside group was 22.9 (SD=2.6) and 23.8 (SD=2.6) in the placebo group. They did not report what the placebo was. They did not report who was blinded (only that it was “double-blind”), or how patients were assigned to their groups (allocation concealment). The authors of the systematic reviewed rated this study as having low-risk of bias, but I’m not sure that I agree. Many variables were measured, but I’m only going to focus on BMI because that’s the contribution to the systematic review.
Using BMI cut-offs for Asian populations, both group as considered overweight (overweight for Asians is 23-26.9). There is still debate as to whether these race-specific cutoffs are appropriate (particularly as “Asian” encompasses so many different populations). At the end of 2 years, the average BMI of the stevioside group was 23.0 (SD=2.0), and 23.6 (SD=2.4) in the control group. Hardly a ringing endorsement of stevioside as a weight-loss supplement.
There’s a trivial problem with the meta-analysis in that the authors chose to compare only the BMI at the end of the trials between groups. So, in this case, they were comparing 23.0 vs 23.6 (with their respective standard deviations.) This could have been problematic particularly if the baseline BMI’s between the two groups were quite different. In this case they weren’t so it’s unlikely they’ve come up with the wrong answer, but it’s worth mentioning that this is a pitfall.
It’s worth nothing that of the other two studies included in this forest plot, that one also used stevia capsules and the other unspecified sweeteners in beverages.
The Randomized Controlled Trials- Body Weight
The second outcome reported in the meta-analysis of randomized controlled trials is weight change (instead of BMI). In this case, 5 trials were included in the analysis, presumably because only weight was reported (and not BMI). The sample sizes of the five trials were 308, 163, 33, 213 and 71. The three largest trials (154, 105 and 42) were all funded by industry (though I am still of the opinion that industry-funded research can still be of high quality under the right circumstances). Two of these three showed significant weight loss when compared to either water and/or aspartame avoidance. The authors of the systematic review rated all five of these studies as having a high risk of bias, with the most common element being a blinding issue. This last point is a huge deal because now we have a possible “garbage in, garbage out” pooled estimate.
Again, the pooled estimate is driven mostly by the larger sample size studies: Peters et al (2016) (sample size 154) and Tate et al (2012) (sample size 105).
So here’s Peters JC, Beck J, Cardel M et al. The effects of water and non-nutritive sweetened beverages on weight loss and weight maintenance: A randomized clinical trial. Obesity 24(2): 297-304. 2016.
This study accepted men and women aged 21 to 65, with a BMI between 27 and 40 across a variety of races. They also had to be current users of artificially sweetened beverages at least 3 times per week and willing to stop drinking them if they were put into the water group. They also had to be weight stable within 10 pounds within 6 months of the trial and not exercise more than 300 minutes per week. The study took 1 year, which started with 12 weeks of behavioural weight management and 40 weeks of weight maintenance. The study did not report how allocation was concealed and it was clear subjects were not blinded. The sweetened beverage group was instructed to drink at least 24 fl oz of artificially sweetened beverages per day for a year. The water group was instructed to drink at least 24 fl oz of water per day for a year as well as to avoid artificial sweeteners in beverages like coffee and tea. They were allowed to eat foods with artificial sweeteners in them. Both groups got coupons from Coca-Cola, PepsiCo, and Dr. Pepper/Snapple each month.
Average weight loss over one year was 6.21kg (SD=7.85) in the sweetener group, and 2.45kg (SD=5.59) in the water group, when considering all subjects who were recruited (ie. the most conservative estimate, since subjects who didn’t finish the study were analyzed using their last available measurement.) This difference was considered meaningful and statistically not by random chance.
What is most interesting about this paper is that baseline BMI was reported in the manuscript, suggesting that BMI at 1 year was also available, but not reported (for inexplicable reasons.) However, it is not clear whether the authors of the systematic review contacted the authors to get this data. However, if this data HAD been obtained and put into the BMI analysis, the outcome would have been completely different as the Peters trial had 308 subjects in it, outweighing the Hsieh study by almost two-fold.
And here’s Tate DF, Turner-McGrievy G, Lyons E. Replacing caloric beverages with water of diet beverages for weight loss in adults: main results of the Choose Healthy Options Consciously Everyday (CHOICE) randomized clinical trial. Am J Clin Nutr 95:555-563, 2012.
This study was a three-arm trial that accepted men and women aged 18-65 years with a BMI between 25-49.9 across a variety of races. They also had to be consuming at least 280kcal/d in beverages, excluding white milk. They had to have not lost more 5% of their weight “recently”. This study took 6 months, where subjects were randomly assigned to one of 2 beverage groups (diet beverage or water) or neither, called the “attention control group” as they did participate in the weight loss program, which consisted of monthly meetings and weigh-in’s as well as weekly monitoring. Both beverage groups received their beverages from the study, and were encouraged to replace at least 2 servings per day of caloric beverages with either water or the diet beverage.
Average weight loss over 6 months was 2.6kg in the diet beverage group, and 1.9kg in the other two groups. The authors presented confidence intervals instead of standard deviations and I’m too lazy to back calculate them as I’m entering hour 4 this goddamn post (I told you this was going to hurt.) However, the gist of it is that the difference between groups to me, is not practically meaningful and statistically, the authors failed to detect a difference between any of the groups. (i.e. no evidence to support diet beverages for weight loss). There were possibly meaningful effects on other measurements, but they are beyond the scope of the systematic review. Might this be different if the study had gone for longer? Hard to say.
Also interesting about this paper is that baseline BMI was also reported in the manuscript, but not reported for equally inexplicable reasons. Also not-explained is the failure of the authors to obtain this data from the original authors to make the BMI meta-analysis complete, though, this may have been enough to drag that pooled estimate back towards the middle.
ALSO, also interesting about this paper in the meta-analysis is that the authors of the meta-analysis chose to combine both non-diet beverage groups as a single group in the meta-analysis, which, mathematically, skews the result more towards the equivalence line than it would have if the authors had treated the analysis as a 1:1 allocation instead of artificially turning it into a 1:2 allocation of diet beverage to not-diet beverage. I’m not sure how the authors chose to justify this pooling. It is a definite source of bias, even if the overall results of the study show that there was no real meaningful difference between the three groups (statistical or otherwise).
What’s also interesting and inexplicable in the forest plots is the use of the terms “Weight loss with NNS” and “Weight gain with NNS” (NNS=nonnutritive sweetener), suggesting that if the estimate lies to the right of the equivalence line, that the NNS group GAINED weight, which they did not; they simply lost LESS weight than the control group. This label is clearly misleading and interpretatively incorrect.
Synthesizing the Weight Loss Forest Plot
I think the Weight Loss Forest Plot is problematic.
This analysis does not account for duration bias and treats all studies as equal. If one study lasts for 1 year, and another 6 months, I am not entirely convinced that you can pool them together. Would it be surprising that weight loss was less in the 6 month study? No, it would not. How does this smaller effect size affect the pooled estimate? It generally drags it more towards the no-difference line. If we look at the 5 studies in summary, the studies that were only 6 months long all suggest no evidence for diet beverages as a means of weight loss (3 out of the 5 studies in the forest plot.) However the other two studies that do show benefit lasted 12 and 16 months. In fact, in the supplemental subgroup analysis, the authors demonstrate that there is a significant effect of study duration on whether studies showed a meaningful effect (p=0.0009). This isn’t commented on in the discussion of this paper at all and I think is a huge boo-boo, particularly from a “knowledge translation to media” point of view because a critical confounder has been buried in an appendix.
Trying to put it all together
An incomplete BMI analysis
The authors would have us believe that there is no supporting evidence to use non-nutritive sweeteners in the realm of improving BMI. I feel very strongly that this analysis is incomplete, particularly because there are at least 2 other larger studies reported in this very manuscript, where that BMI data exists. I know that the Peters trial would pull the pooled estimate to the left (in favour of using non-nutritive sweeteners). I’m not sure how the Tate trial would move that estimate. Maybe it wouldn’t make a difference at all. But until someone takes this on again in a few years, we’ll never know. And while I agree that the current, incomplete analysis does show there is no supporting evidence to use non-nutritive sweeteners to improve BMI, there sure as hell isn’t any evidence to show that there isn’t a reason to use them either. The ball is essentially dead, unable to be moved forward in this case.
Two trials using stevioside as medication, not as a dietary aid.
I think the authors incorrectly framed their original question, “Is routine consumption of non-nutritive sweeteners by adults and adolescents associated with adverse long-term cardiometabolic effects in RCTs and cohort studies.” The question informed the search strategy and the screening process of eligible papers. Two of the seven trials in this paper reported the use of steviaside in capsule format. This is essentially using stevia as a medication, or supplement, as opposed to a dietary strategy. It’s not a typical case use of nonnutritive sweetener. Subjects in these studies didn’t replace calories derived from caloric sweeteners with stevia. It is therefore hardly surprising that supplementing your diet with stevia, in the absence of any other dietary changes doesn’t result in changes in BMI. In fact, I think it would be totally surprising if it did and the world would have probably razed Brazil to the ground with the demand for stevia by now. I don’t think you can pool these capsule studies together with dietary replacement studies. And this would also have completely changed the BMI analysis.
The struggle of, “Garbage in, garbage out” vs. scientific integrity in systematic reviews
As I’ve written before, all that glitters is not gold. Data synthesis has to take data quality into account. Heck, any data analysis has to take data quality into account. If the majority of your data is flawed, the effect of those flaws only becomes magnified in a synthesis analysis. I personally struggle with balancing this in data synthesis. On the one hand, presenting the results of your experiment is just that. You are practically duty-bound to publish the results of your experiment when you said you would (to your funding agencies, to the study registry, to yourself.) It’s left up to the reader to interpret your findings (down to your interpretation of your own findings). There is nothing in this paper’s overall design that is inherently wrong. The question, however, that remains unanswered is whether the paper adds anything to the overarching conversation on artificial sweeteners. If the studies that contribute the most to your conclusions are all of low quality, have you moved the ball forward? Does the ball move forward when you simply reveal that all of the evidence is low quality? Should you then present a forced data synthesis, essentially making the best shit sandwich you can out of the shit that you have now discovered is all the ingredient that you have? Or, to put it less vulgarly: Systematic reviews are a bit like field research—you might set out to see what kinds of finches live in the Galapagos, only to find they’re all dead. You had no way of knowing, but now that you’ve found piles of bird corpses, what do you write about them?
I struggle with this because we, as scientists, can no longer hide behind the veil of the accessibility wall. Our work is read, for better or worse, and can be read by anyone, to be used to forward nefarious agendas, to entertain, and least of all, it seems, to inform. Do we have any responsibility towards that reality? It seems terribly paternalistic, I know, but there isn’t a “world of science” and a separate “everything else” anymore. Can you really send out “art” in the form of a Nigerian prince and say that the money that showed up wasn’t your fault because you were just exhibiting your art? I don’t have an answer for this, but it’s definitely worth discussion.
Analysis takes time and digging. You can’t just stop at the published manuscript.
My friends tell me that I don’t post often, but that every post is crazy long. This one’s length has a purpose. I’ve only scratched at maybe 1/3 or 1/2 of this paper and I’m approaching an unreasonable number of words by any standard. I predict that in the coming days, the media will report what’s at the surface of this manuscript and move on, leaving confusion in their wake. I started writing this post 9 hours ago, including re-reading the original paper, and reading the three studies that seemed to influence the authors’ conclusions the most. I got a hair cut somewhere in there, and took a nap, because I needed to clear my brain, so all in all, I’m probably at the 6-7 hour mark.
This post is this long and has taken this long to write because that’s how long critical appraisal of literature takes. If you really care that much about artificial sweeteners, this is what it takes to get to the point where the study starts to make any sense at all (which, in the end, might be no sense at all), and to the point where you can choose to incorporate the “evidence” into your life, or the lives of people that you interact with/advise/coach/doctor/train. If you aren’t willing or able to do this work; to dig this deep (and it’s NOT THAT DEEP), then any opinion you express is incompletely informed, and you are perpetuating the unending cycle of ignorance and misinformation that just continues to propagate virally through the ether that is the Internet that allows charlatans to exist. It is a disservice to the people who pay heed to your words.
I told you this was going to hurt.