Recently, Friends of the Earth, published a paper in the journal Environmental Research (Fagan, et al 2020, “Organic diet intervention significantly reduces urinary glyphosate levels in U.S. children and adults”) that claims that eating an organic diet reduces glyphosate levels in the urine of adults and children. This got picked up by several media outlets (I’m aware of these: here, here, and here), and has been making the rounds on social media (which is where I first saw it).
The problem is, the Friends of the Earth study was completely flawed. The study design was flawed. The analysis was flawed. The data in Table 1 are flawed.
In other words, this study is likely not usable by anyone. At all. Anywhere.
Does This Study’s Results Actually Matter?
Let’s cut to the chase — most of you really just want to know, does this really matter?
Speaking from a toxicological point of view, absolutely not.
The levels in the urine correspond to blood levels that are well below the threshold where toxicity has been seen.
So there ya have it folks, if you’re here for the bottom line, that’s it — this is a big nothingburger.
On top of that — the study is completely flawed. Like the study design is wrong, too few families were used in the study, the analysis is flawed, there may have been a sample mix-up, and the data in one of the tables is flawed.
If you’re here for the science, keep reading.
The Authors Won’t Share Their Data or Statistical Code…
I really didn’t trust the study after I read it, and I wanted to reanalyze the data correctly (see below), so I reached out to the corresponding author, Dr. Klein. When I asked her for access to the data, and access to the statistical analysis code, and more details on how the analysis was conducted, Dr. Klein refused to share the data with me, stating, ” We aren’t sharing the original data set…” Dr. Klein also refused to give me the additional information I asked for. Refusing to give anonymized data to another scientist, after the paper is published, is not cool. Refusing to give statistical analysis code is also not cool. These types of refusals tend to decrease the community’s trust in the study.
I waited to post this blog until I heard back from Dr. Klein. I was hoping I would be able to reanalyze the data, and then post about a more appropriate analysis of their data.
So now here’s the rest of the original blog post detailing all of the flaws that I found. Go give it a read, and let me know if you find additional flaws (find my info at the Raptor Pharm & Tox, Ltd page).
Flaw 1: Data Reporting
When I review a paper for a journal, I have a checklist I run through. One of the first things on my checklist is to make sure the data being reported in tables and figures make sense.
Here’s Table 1 from Fagan, et al:
The first issue is that the median concentration of glyphosate in the urine of adults eating a conventional diet is 1.19ng/mL — that’s larger than the max value of 0.82! It’s also not within the interquartile range (explainer: interquartile range is the 25th and 75th percentile of a distribution; the median is smack dab in the middle at the 50th percentile). Clearly there’s something wrong here with that median.
Next up is the median concentrations of glyphosate and AMPA (AMPA is a metabolite of glyphosate that our bodies produce normally when trying to get rid of it) in adult urine fed an organic diet. In both cases, the median (50th percentile) is the same as the 25th percentile — that is simply not possible.
Some people like to say, “Hey, it’s okay, they made a few mistakes, those can be corrected.” And I say yes — you are right, those are human mistakes, I’ve made them myself, and I’ve had reviewers catch them, thankfully (I prefer when I catch them before the paper goes out for review).
The problem is that 1) the peer reviewers should have caught this, and it appears they didn’t (so the peer review wasn’t very good in this case), and 2) yeah, mistakes happen, but these are data that are critical in understanding what the authors’ findings are. Making a mistake here tends to send new questions about the quality assurance, recordkeeping, and quality control of the authors’ and their laboratories.
In regulatory toxicology we have a saying, “If you mess up a data table in your regulatory filing, how can I trust anything else you did in the laboratory?”
Flaw 2: A Sample Mixup?
In Figure 4 there appears to be a sample mixup:
The black line shows a really low glyphosate urine level in that adult while on the conventional diet, with a far higher level following the organic diet.
The authors explain this result as “In the case of the exception, the overall trend during the organic phase was downward except for a spike in urinary glyphosate level on day 9, which may be due to consumption of non-organic food recorded in the subject’s food diary.”
I’m just a bit dubious about this explanation. Let me explain why:
If the results are real for the adults, then the data show a decrease in urine glyphosate concentrations following a transition from a conventional diet, to an organic diet (which is what the authors wanted to see). However, that one adult that is the exception has a really low level of glyphosate in their urine — one that seems to correspond with an organic diet in all of the other adults.
In addition, that same anomalous adult has a glyphosate urine concentration during the organic diet that looks suspiciously high — similar to the levels in the other adults eating a conventional diet.
It almost appears that the organic diet and conventional diet urine samples were all switched for this one individual.
Let’s Test Their Explanation With Math!
The authors explanation is that the spike is due to the subject eating conventional food on day 9. Okay, let’s test out that hypothesis mathematically:
So they have 5 days of values that they use while folks are on the organic diet (they drop the first day that folks are eating on the organic diet, so they use days 8-12). It appears that their average glyphosate concentration during the conventional is set at or near 50% of the limit of quantitation (I’m basing this off of Figure 4 and how the authors state they handle cases of concentrations below the limit of quantitation), or 0.025ng/mL.
If the person ate a conventional food product on day 9, we would expect to see the higher level wash-out by 24hrs based on their Figure 2. So we would anticipate seeing a spike on day 9, washout by day 11, with day 10 being slightly elevated.
The problem is that the only way to see a decrease in this person’s glyphosate levels is if their urine concentration was below the limit of detection, in which case the authors would set it at (0.02/square_root(2)) = 0.01ng/mL.
Given that information we have the following measures for days 8-12:
0.01, 0.01, 0.025, 0.01, 0.01
That would result in a mean value of 0.013ng/mL for the organic phase in this adult. That is significantly inconsistent with the authors’ report of 0.30ng/mL.
So let’s give the authors the benefit of the doubt — how large of an increase in glyphosate during the organic phase would it take to get a level of 0.30ng/mL?
The answer is the glyphosate urine concentration would have to go up to 1.45ng/mL on day 10 in order to result in an average glyphosate urine concentration of 0.30ng/mL given my estimates for the values for days 8-9, 11-12.
In other words, the authors’ explanation does not comport with the expected data given what was reported.
A Far More Reasonable Explanation….
It is far more reasonable to expect that the authors accidentally swapped the samples. The values for the glyphosate concentration while on the organic diet are far more consistent with the glyphosate urine concentrations while the other adults were on the conventional diet.
Flaw 3: Improper Statistical Analysis
Okay, this experimental design is complicated, so I’m going to try to break this down as best I can.
The authors recruited 4 families. Given the ages of the adults and children in the families, it is reasonable to expect each family constitutes a household. And let me be clear what I mean here — a household here is a family that lives together under one roof. A household, especially given the ages of the children involved, are likely to eat the same food (generally the parents don’t like to play short-order cook for their family [at least we don’t in our house], so the kids typically eat the same or similar food as the parents).
The authors collected first morning urine for 6 days while the participants ate conventional food (what they normally eat). And then on day 6 the participants were switched to organic food. The urine samples from days 1 and 7 were generally not used, unless a sample was missing (which happened in the organic phase), in which case they would use day 7 urine.
Due to repeated measurements from each person, the authors analyzed the data using some type of general linear mixed model — I don’t know what the parameters are or how the model was structured as the authors refused to give me that information.
Here’s the problem — remember I said that the participants were chosen as families, right? So each family is a household. Well, that’s kinda the rub here — the authors have a multiply nested design — their samples are nested within each individual, AND each individual is nested within their household/family.
How Were The Authors Supposed To Analyze This?
Welp, in frequentist statistics the authors should have used a mixed model that could account for BOTH levels of nesting, not just within individual. In Bayesian-speak, we’d say they have a multi-level hierarchical model. This is pretty standard stuff for most biostatisticians — we run into this all of the time.
And The Ramifications for Not Analyzing Properly?
Welp, kinda got bad news for the authors. Their p-values aren’t valid for one. The used 7 degrees of freedom for adults, 9 degrees of freedom for the children, and a total of 16 degrees of freedom for the combined analysis. That’s not right at all.
4 degrees of freedom — that’s how many they have for adults.
4 degrees of freedom — that’s how many they have for the kids.
4 degrees of freedom — that’s how many they have for the combined analysis.
That’s a pretty big difference.
Flaw 4: But Wait, There’re More Stats Issues Here
It appears the authors also did some pairwise comparisons — I’m guessing they performed model-based contrasts, like Wald’s statistics (model-based t-tests). I’m guessing because I don’t actually know — you see, the authors didn’t say what they did. I asked them to share that information with me, they refused.
But here’s the other rub — if you’re going to do multiple tests, then you need to do some type of correction.
What am I talking about?
So, each time you run a statistical test comparing two groups to see if there’s a difference you increase the probability that you’re going to get a false positive. Said another way, each time you test, you have an increased probability of saying there’s a difference when there isn’t actually one. That’s a problem.
In statistics, we control the Family-Wise Error Rate — that’s the fancy way of saying we adjust the statistical test or the p-value to prevent that increase in false positives.
In this case, Bonferroni is appropriate — which is simple. If the authors want to maintain a 5% false positive rate, and they want to perform 6 comparisons, then their p-value needs to be less than 0.05 / 6 = 0.0083.
The authors didn’t say that they controlled the Family-Wise Error Rate, and if they did, they didn’t say how they did it (again, if I peer-reviewed this paper I’d have them put these types of details in).
What the authors do give us is the fact that the p-values are < 0.01 — that’s not actually very informative. So I don’t know if the authors meet the adjusted p-value threshold I calculated or not.
Flaw 5: Statistical Power
Generally when we have small sample sizes, such as 4 families, we also have low statistical power, which manifests as increased false positive rates. I’ve covered this at the Toxic Truth Blog here, but it’s also been discussed in the literature here and here. This small sample size issue is likely the leading cause of study reproducibility issues in toxicology today. This study is simply contributing to the problem.
Flaw 6: So Few Families Means We Can’t Generalize To The Population
The other problem with small studies, like this one, is that we can’t generalize these results to the general population. Small studies like this one suffer from sampling bias. To make this study useful and generalizable, each of the flaws have to be addressed, but especially the number of people in the study.
Flaw 7: The Authors’ Gross Lack of Statistical Knowledge
This sentence from the authors is quite problematic:
Point 1: the fact that a study made statistically significant conclusions is not success. It demonstrates that the authors were on a mission to find a statistically significant result.
That’s not how you do these types of statistics. The authors are using the Null Hypothesis Significance Testing (NHST) framework. This framework is predicated on the idea that you are trying to prove the null hypothesis — not disprove it! You are designing an experiment that is well-powered to avoid the sample bias and false positive issue. You want more experimental units (families in this case) in your study because you want to be sure that you are not rejecting the null hypothesis without good evidence.
Point 2: In a mixed model, where there is repeated sampling, the repeated sampling doesn’t really add to your statistical power. Having 158 samples didn’t make the experiment work, because the degrees of freedom that the authors’ used is 16, 9 or 7, depending upon the contrasts being run. Not 158! Again, this demonstrates that the authors do not understand the statistical tests they are using.
Point 3: Only 16 individuals and yet the authors feel like this is a good study and can be generalized to the entire population? That is simply ludicrous. No decent regulatory agency in the world would approve a drug if only 16 people ever took it, if the drug were meant for the entire population. That’s simply wrong!
Flaw 8: Are These Actually Random Families?
I asked the authors for additional details about how the families were chosen to participate. I was told all of the information I needed was in the paper or the supplementary data. Here’s what the paper said:
It’s that first sentence that gets me. “Families were originally contacted via a recruitment email…” So how exactly did Friends of the Earth get the emails for the families in the first place. I asked the authors that question and I was directed to the paper and the supplementary data.
If Friends of the Earth knew of these people already, and had their emails, that suggests these aren’t random people. If these aren’t random people, then there is a risk of bias associated with these subjects, especially given that they knew that they were eating organic food and when.
If you made it this far, thanks for reading. There’s a lot to digest here.
Here’s the bottom-line:
The levels of glyphosate that result in these concentrations in the urine are so tiny that they aren’t even toxicologically relevant. It doesn’t matter if the results are real — the levels of glyphosate aren’t relevant. There are numerous problems with this study, and it’s simply not trustworthy.