10/21/2013-My response to “Trouble at the lab”

A recent article from the economist, “Trouble at the lab” published 10/19/2013, claims that the scientific process is failing due to a number of factors such as the drive to publish in high quantities, the lack of publication of negative data, improper statistical analyses, and poor peer review. This failing, The Economist claims, is leading to large amounts of irreproducible findings becoming part of the science canon. Although I doubt you will find any scientist who will state that errors or mistaken conclusions are never published (and that scientist would be dead wrong), I am sure that most of us in the field strongly disagree with the conclusions of this article. This article relies on three main arguments:

1. Scientific findings that are currently being questioned

The article starts with the concept of priming, a recent tenant of psychology (not my area) proposed a decade ago that is now coming under question due to irreproducibility. In addition, two recent studies by Amgen and BayerHealth are described that showed only a low percentage of “seminal” studies were reproducible. The article also cites a 2010 Science paper that proposed a genetic basis for longevity that was retracted a year later because “Other geneticists immediately noticed that the samples…had been treated in a different way…” as an example of incorrect science being published. But are not all of these points evidence supporting that the field is working and self-correcting? The public needs to realize science is complicated and takes times. Self-correction will never be immediate but rather a process over many years.

2. Journals and peer reviewers are not doing their job

This article describes a recent study by John Bohannon from Harvard that submitted an intentionally flawed paper to 304 open-access journals in which 157 of these journals accepted the paper, often without significant peer review as evidence that the peer review process is failing. This particular study was targeted at many “predatory” journals. These types of journals are not interested in scientific quality, but merely try to lure authors into quick publications to collect hefty publication fees. It is quite misleading for the authors of “Trouble at the lab” to imply these journals are at all representative of the peer review process in established, respected journals. Moreover, the Bohannon study is itself evidence that the scientific community is self-policing. In fact, the Bohannon study found that PLOS One, the largest open access journal in the world, rejected the flawed paper and provided some of the most detailed and accurate reviews. Yet, the authors of “Trouble at the lab” cite PLOS One’s 50% rejection rate as further evidence of a flawed peer review system. It is in fact quite the opposite! If 50% of all papers to PLOS One, which is considered an “easier” journal to publish in, are rejected, imagine how difficult it is to publish in more established journals. I speak from experience-believe me.

3. A statistical argument that statistics are flawed (irony?)

Discolosure-I am not a statistician, but I’ll try to present the article’s main argument. The Economist argues that at the standard significance threshold of 5%, meaning that the results observed only occur by chance 1 in 20 times, many of the conclusions are incorrect. This is based on their estimate that only 10% of hypotheses are correct (the basis for this estimate is not at all clear). Therefore, with a power of 0.8 (i.e. 2/10 hypotheses are not supported due to chance), then one finds 8% correct hypotheses and 4.5% incorrect hypothesis (the 90% wrong hypotheses multiplied by 5%). In other words, a third of hypotheses are incorrect due to chance.

The major flaw in this argument is the assumption that a hypothesis is deemed “correct” based on one result with a 5% significance finding. Wouldn’t that be nice if I could publish a paper with only one figure-THE experiment showing my model is correct! Rather, in good publications, multiple experiments are done to test one hypothesis from many different angles. In fact, a paper from my lab that was just published accepted last week in Molecular Microbiology has 14 data figures! Therefore, if an average paper uses three pieces of evidence to support one hypothesis with each having a confidence of 5% then the probability that the observations occurred by chance would be 0.05*3 or 0.000125. Even then, good scientists do not state their hypothesis is “correct”, but rather “supported” by the current data. On top of that, ideas are not fully accepted into the science lexicon until they have been repeated by others in different settings, further adding to the rigorous bar science must cross before becoming a widely accepted hypothesis. The cream rises to the top and the rest falls by the wayside.

Bottom line: This article should be rejected

I am concerned that this misleading piece by The Economist will be misinterpreted and used as ammunition in the “War on Science” that we see play out in public forums. Much of the evidence provided in this article supports that scientific process is working. No one denies that papers are published with incorrect conclusions. But this is because we are studying complex processes in greater and greater detail. As much as we would wish it, science does not occur as a relatively linear set of experiments from point A to point B, but rather as a tangled web of inquiry from point A to point Z, sometimes leaping forward, sometimes retreating backward, but ultimately increasing our understanding of the world around us. From my vantage point, “the lab” is doing just fine and will continue to drive discovery and innovation, providing solutions to society’s biggest problems.


  1. I just gave a talk about this topic for my lab meeting – I’m a post-doc in ‘stem cell’ biology. Actually I gave the talk at a yearly lab seminar where we talk about something in science that we feel strongly about that has no direct relationship to our projects. It is a good idea for any lab 🙂

    Interestingly many must be thinking along these lines since many major journals appear to be addressing the issues of peer review and publication of science lately.

    In my talk I showed a graph depicting the amount of data that was contained within high impact papers from 1920-2013 using Science magazine as an example. Granted this is setting the bar very high but since this is the gold standard for how we move forward in our careers it seems appropriate. What is glaringly obvious to anyone who reads older papers is that the amount of data required for publication has grown exponentially in the last 20 years. This seems to mirror the growth of the internet and computer technology and perhaps that isn’t surprising given that it would have been impossible to have 10-20 supplemental figures 20 years ago. However I wonder whether you agree that this growth in data/paper is a self-perpetuating problem in that now reviewers have begun to expect more and more data for a paper to be deemed a ‘complete story.’ The very notion of a ‘complete story’ seems ridiculous to me and I cringe whenever I get reviewers telling me that I lack ‘mechanism’ and need to do more experiments to demonstrate significance in ‘high impact animal models’ of disease.

    I agree with some of your views on this article but I think that the article missed the most important points of the debate. I think the real problem is the incredible delays that we experience in seeing data because people are willing to keep their results private for years waiting for the 10+ figures worth of results that are necessary for submission to journals with high impact factors. Not to mention the incredible amount of waste that results from reviewers demanding experiments from labs that do not have the expertise to carry out such experiments. A personal example is a paper I had recently sent to Nature where the reviewer demanded a mouse model of disease to demonstrate clinical impact. I was a Ph.D. student at the time in a human biology lab with no established animal models. We sought collaborators but that took a lot of time (not to mention the fact that most collaborators are unwilling to invest large amounts of time and money into experiments where the PI or experimenter are not likely to be first/last authors). We even developed a few models in the lab but were unable to get strong results, possibly because of a lack of expertise. In the end we have sought lower level journals for the findings.

    So I ask why is it useful for us to bear the responsibility of telling a story from A-Z before publishing? What if we could just submit ‘Figure 1’ and let the community decide whether it is interesting enough to replicate and move forward with. In that way I could get credit for describing the molecular finding in human primary cell culture, another lab could get credit for doing the mouse ‘clinical’ models, and yet another lab could get credit for working out the biochemistry. This could be accomplished in a fraction of the time if all labs were using methods and tools that were native to the different labs. This would also cut down on the amount of data out there that was not reproducible. It would be deemed reproducible sheerly by the efforts of distinct labs each working towards a common goal separately. That seems to be the way Science worked before the era of ‘complete stories’ and volumes of supplemental figures (which I fear nobody really every reads).

    The issues then become how we determine promotion etc… I believe that this could be solved by simply using a system like how google dictates which websites are best after searches. It would be accomplished by looking at how many labs link their findings to a specific finding. The greatest credit would go to the person who made the seminal observations and less credit would go to those who repeat the finding in different models, settings, etc.. Each experiment would function as a node in a large collection of data.

    It sounds really difficult but I think it could work. What are your thoughts? This may be a bigger problem in the biomedical sciences but I think it would be useful in many different areas of Science. I heard that math works a bit like this…

    Thanks for the article.

    1. Dear Jeff,
      Thank you very much for your comment and I agree with you 100% about the growing bar for publications. I have been publishing since 2001 and have seen quite a difference since my grad school/post doc days. Almost every paper we now submit first comes back as a soft reject as, like you state, the reviewers request 3-4 more experiments that need to be completed. Many of these experiments ultimately end up as supplemental data. We usually are able to complete these requests but it delays publication by months or even years.

      This is one of the main problems I had with The Economist article as in my experience both as a submitter and as a reviewer, “bad” reviews do not usually err on the side of not examining the paper enough, but rather on the side of overzealousness asking for much more data to complete mechanisms or animal models (I am thinking about blogging on this topic!). I think a reviewer’s job is to examine the work as presented, perhaps suggesting a key experiment or control if needed. But if the paper is not sufficient at that level then it needs to go to another journal.

      Your idea of publishing one figure at a time is interesting and something I have not heard. I think it might be difficult to do because our interpretation of Fig. 1 often changes as we develop the “story”. So the data might be fine, but understanding what it means takes more effort. I think this is what reviewers/editors want when asking for the complete story and it what we try to do in my lab. But I always present mostly unpublished work when I give talks so I certainly do not try to hoard new results (but it might come back to bite me and my students).

      There are a number of proponents for publishing all of ones work in an open access journal like PLOS One which doesn’t judge a manuscript based on impact, even though the research might make it in a higher impact journal to address the problems you mention. The review process is theoretically faster, the amount of data needed is lower, AND the information is freely accessible. I like this idea a lot and think that the good papers would ultimately rise to the top. However, it is risky for young researchers like you and me as the administration and establishment might not give us as much credit for these pubs. But I think publishing is moving this direction and might be a solution to decreasing the amount of data needed for a paper and increasing publication speed. It would also encourage more studies that try to replicate previous results.


  2. There are a couple of points in your reply I feel need addressed.

    Firstly, PLoS One’s rejection of half its submissions for basic failure of metholodogy was not held up by the Economist as a weakness of peer review. Instead, the Economist introduced this as “another line of evidence suggesting that a lot of scientific research is poorly thought through, or executed, or both”.

    This is a difficulty. How did we get to a position where such large proportions of papers are so bad? Many of these papers will eventually be published somewhere. And no, perhaps it’s not a “high impact” journal that they land in, but that such bad work is done in the first place ought to be a cause for concern.

    Also, more seriously, you neglect to address the stronger line that was used to suggest weaknesses in peer review – i.e. the BMJ’s peer review investigation.

    Secondly, you blithely say, “if an average paper uses three pieces of evidence to support one hypothesis with each having a confidence of 5% then the probability that the observations occurred by chance would be 0.05*3 or 0.000125.” This is true only if the confidences of these pieces of evidence are wholly independent. In reality, we see interdependence frequently, and worse, well-meaning scientists falling into the Texas Sharpshooter Fallacy, which makes a mockery even of the initial 0.05.

    The Economist’s article is not so bad, and dovetails in many ways with Dr. Ben Goldacre’s criticisms of various scientific abuses. There is a lot of very badly done science out there, and while no, it will not last, it sucks up resources that could be better spent if more care was given to good methodology.

    1. If I can be so bold as to take a stab at interpreting Dr. Waters’ comment about the multiple-figure approach and its robustness against statistical fallacies (and I agree wholeheartedly with you re: methodology):

      My immediate interpretation (coming from a life sciences perspective) is that “multiple” implies diversity AND quantity, as opposed to just quantity. In other words, three immunoblots under identical conditions, but for different proteins, is unlikely to be a robust defense against systematic artifacts compared to a single immunoblot. However, an immunoblot, gene expression data, and a functional assay comprise an entirely different scenario.

      There ties in to another facet of this discussion that the Economist article circles around, but does not directly address– determining the applicability of published data. If an experimental system is technically consistent, and the data are properly analyzed, the results are very likely true within the constraints of that specific system and experimental conditions. If that is all that is presented, then that is the extent to which the findings may be applied. No more, no less. Unfortunately, this requires detailed information and data on methodology, and as we are well aware, the methods section has been shrinking at an alarming pace, with little enforcement for detailed documentation and validation of methods during peer review.

      Solely telling me that you performed method ABC according to a modification of the original method published by Doe et. al. in 1992 isn’t terribly helpful; to verify this, I would need to find the reference, read the paper, and (assuming Doe et. al. (1992) didn’t cite another paper for their method) see if the source method is transferable to your specific system and conditions– and if not, whether you provide data to validate the multiple dimensions that comprise a qualified assay. Rinse and repeat for every single method cited as such. And, because method validation data isn’t commonly required, and so much of biological research is presented in a relative fashion (i.e. ratio/percentage vs. controls), the next person who wants to use a method often has no idea what he/she should be seeing on an absolute basis.

      1. Thank you both for your comments. I agree that my statistical analysis is a vast oversimplification and it was meant to be “tongue in cheek” so to speak. Victor is spot on in my meaning though as in my field there are usually multiple different pieces of evidence supporting a hypothesis or model. For example, in a recent Molecular Microbiology paper that was just accepted from my group (Pub-med ID: 24134710), we showed that a small signaling molecule inhibited binding of a transcription factor (tf) to DNA by all of these pieces of evidence:

        1. target gene expression in vivo is inhibited in the presence of the small molecule
        2. the purified t.f. does not bind to target DNA in the presence of the molecule using two independent biochemical assays (EMSA and DNAseI footprinting)
        3. we isolated mutants of the t.f. that no longer bind to the small molecule. With this mutant, expression of the target genes are not inhibited by the molecule in vivo.
        4. These ability of these purified mutants to bind DNA in vitro is not affected by the small molecule

        So, we have multiple lines of evidence supporting our hypotheses. How do you put a statistical significance on experiments like these? I’m not sure. But I do think the Economist article was quite misleading and flawed by suggesting only one experiment with a p value of 0.05 is used to support a hypothesis. As a disclaimer, I am a molecular microbiologist, so I can’t comment on other fields. For larger experiments like epidemiological analyses or clinical trials the point of the article might be more relevant. But these studies usually rely on advanced statistical analyses for that reason.

        In regards to Peter’s second point, I did not comment on that article as I have not had time to directly examine it. All I can say is that it doesn’t agree with my own experiences. As mentioned in my reply above, most of my reviews are pages and pages long pointing out many details and flaws, some of which I agree with, others which I don’t. I have never had a review where it appears the reviewers did not put forth their best effort to try and thoroughly critique the submission.

        I agree there are a lot of bad papers that are submitted. I think this is mostly due to the rise of questionable open-access journals (i.e. the ones targeted by the Bohannon study) that will accept anything as long as you pay the publication fee. I think often bad papers are submitted to reputable journals first, knowing that there is a good chance they will be rejected (and they usually are), with the intention to eventually publish in one of these. It is a problem. Most of the people I would consider to be drivers in the field can quickly recognize the wheat from the chaff. But the public is not as tuned into what science is good versus what is crap….

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s