Chapter 5 Reproducibility

by Jade Benjamin-Chung

Our lab adopts the following practices to maximize the reproducibility of our work.

  1. Design studies with appropriate methodology and adherence to best practices in epidemiology and biostatistics
  2. Register study protocols
  3. Write and register pre-analysis plans
  4. Create reproducible workflows
  5. Process and analyze data with internal replication and masking
  6. Use reporting checklists with manuscripts
  7. Publish preprints
  8. Publish data (when possible) and replication scripts

5.1 What is the reproducibility crisis?

In the past decade, an increasing number of studies have found that published study findings could not be reproduced. Researchers found that it was not possible to reproduce estimates from published studies: 1) with the same data and same or similar code and 2) with newly collected data using the same (or similar) study design. These “failures” of reproducibility were frequent enough and broad enough in scope, occurring across a range of disciplines (epidemiology, psychology, economics, and others) to be deeply troubling. Program and policy decisions based on erroneous research findings could lead to wasted resources, and at worst, could harm intended beneficiaries. This crisis has motivated new practices in reproducibility, transparency, and openness. Our lab is committed to adopting these best practices, and much of the remainder of the lab manual focuses on how to do so.

Recommended readings on the “reproducibility crisis”:

5.2 Study design

Appropriate study design is beyond the scope of this lab manual and is something trainees develop through their coursework and mentoring.

5.3 Register study protocols

We register all randomized trials on clinicaltrials.gov, and in some cases register observational studies as well.

5.4 Write and register pre-analysis plans

We write pre-analysis plans for most original research projects that are not exploratory in nature, although in some cases, we write pre-analysis plans for exploratory studies as well. The format and content of pre-analysis plans can vary from project to project. Here is an example of one: https://osf.io/tgbxr/. Generally, these include:

  1. Brief background on the study (a condensed version of the introduction section of the paper)
  2. Hypotheses / objectives
  3. Study design
  4. Description of data
  5. Definition of outcomes
  6. Definition of interventions / exposures
  7. Definition of covariates
  8. Statistical power calculation
  9. Statistical analysis:
  • Type of model
  • Covariate selection / screening
  • Standard error estimation method
  • Missing data analysis
  • Assessment of effect modification / subgroup analyses
  • Sensitivity analyses
  • Negative control analyses

5.5 Create reproducible workflows

Reproducible workflows allow a user to reproduce study estimates and ideally figures and tables with a “single click”. In practice, this typically means running a single bash script that sources all replication scripts in a repository. These replication scripts complete data processing, data analysis, and figure/table generation. The following chapters provide detailed guidance on this topic:

  • Chapter 5: Code repositories
  • Chapter 6: Coding practices
  • Chapter 7: Coding style
  • Chapter 8: Code publication
  • Chapter 9: Working with big data
  • Chapter 10: Github
  • Chapter 11: Unix

5.6 Process and analyze data with internal replication and masking

See my colleague Jade Benjamin-Chung’s video on this topic: https://www.youtube.com/watch?v=WoYkY9MkbRE

5.7 Use reporting checklists with manuscripts

Using reporting checklists helps ensure that peer-reviewed articles contain the information needed for readers to assess the validity of your work and/or attempt to reproduce it. A collection of reporting checklists is available here: https://www.equator-network.org/about-us/what-is-a-reporting-guideline/)

5.8 Publish preprints

A preprint is a scientific manuscript that has not been peer reviewed. Preprint servers create digital object identifiers (DOIs) and can be cited in other articles and in grant applications. Because the peer review process can take many months, publishing preprints prior to or during peer review enables other scientists to immediately learn from and build on your work. Importantly, NIH allows applicants to include preprint citations in their biosketches. In most cases, we publish preprints on medRxiv.

5.9 Publish data (when possible) and replication scripts

Publishing data and replication scripts allows other scientists to reproduce your work and to build upon it. We typically publish data on Open Science Framework, share links to Github repositories, and archive code on Zenodo.

5.10 Use Zotero for Citing Sources

Zotero is a reference management tool that helps researchers collect, organize, cite, and share sources. In our lab, we use Zotero to keep research articles organized, use citation management, and have consistency when collaborating on literature reviews, manuscripts, and presentations. Use the following steps to get started on Zotero:

  • First, download the desktop version of Zotero. Using the following link: Zotero Download Page
  • Select the appropriate version based on your operating system. (Windows, Mac, Linux, etc.).
  • Also download the Zotero Connector for your preferred web browser (Chrome, Safari, Firefox, etc.). The connector allows you to save papers and sources directly from your browser into Zotero with a single click.
  • Zotero also offers mobile apps available on the Apple App Store and on the Google Play Store.
  • When you open Zotero on your desktop for the first time, a pop-up should appear asking you to install the Word Plugin. Install this plugin. It allows Zotero to integrate directly with Microsoft Word for citations and bibliographies.
  • If you accidently dismiss the pop-up or it does not appear, follow the installation instructions here: Word Plugin Installation Guide
  • To create a Zotero account, use the registration page below: Zotero Account Registration
    • A validation link to your email
    • A separate email containing a One-Time Verification Code
  • Follow the instructions in the validation email to complete account setup and log in.
  • For additional help getting started, refer to Zotero’s Quick Start Guide: Zotero Quick Start Guide
  • Lastly, ask Dr. Grembi or another lab member to send you an invitation to join the lab’s Zotero group/library.