Reproducible Research with Word?

Kieran Healy’s post describing his manuscript preparation workflow inspired me to write up some notes about how I approach some of the same challenges. Namely, other people.
Healy’s summary is required reading for what follows. But I warn you, if you are still reading my words, his site will consume your attention for a few hours. Don’t blame me when you are fiddling with his templates rather than doing your actual work.

I am 100% sold on the idea of “reproducible” research and the benefits of going from raw data to manuscript tables, text, and figures in one set of files. I will evangelize to my un-R’ed colleagues without much prompting these days. I’ve even started to find ways to bring up the topic with non-work friends who still have AOL accounts. When I have a captive audience, I press “compile” in RStudio and get semi-genuine “oohs” and “ahhs” when the text and tables appear.
But the fact remains that I have a zero percent conversion rate among collaborators. Every one of them—and they are lovely people—is stuck in a workflow centered around Word, Excel, Command-C, and Command-V. So until recently, I’ve just given in and played nicely in Word. As Healy writes, “There’s little to be gained from plain-text dogmatism in a .docx world.”

 

But what if there was a world in which both approaches could exist?

In his post, Healy goes on to describe a really great setup. As someone who has played around with a lot of the tools he describes, I can agree when he writes that his setup is not as crazy as it looks. But to the uninitiated, it will look crazy…until it doesn’t.
I’m going to go in a different direction and discuss how to combine some of what Healy describes with an olive branch to my Office’rs.

 

Slide1

Dropbox vs Git

This is a misleading heading. There is no “vs” among the Office’rs. If they download Dropbox, consider it a win and go home. There’s no getting them to git. But they don’t need to. If you are “taking the lead” on the data processing and analysis, then the Universe grants you the right to put it under version control. But don’t expect anyone to use it (except for research assistants who can’t refuse).

Don’t despair. Share.

Screenshot 2014-01-27 22.23.15

Go ahead and create a repository for your study. I’ll use a subdirectory of a repo I have for StackOverflow examples. Grab my files here. Here’s a screenshot of the root folder “collaborate”. This repo exists in my Dropbox folder, but I don’t share it with anyone (though it is available to collaborators for forking on Bitbucket). The only folder I share via Dropbox is sections in collaborate/reports/JofImportantThings.

Look, right there in sections are some good ‘ole .docx files! Collaborators are free to write in Word and track changes all day and night. All they have to do is use some minimal LaTeX, e.g., \subsection{heading} for headings, \cite{key} for references, “quotes”, escaping \%, \$, and \&.

Master of the House

All of the fun happens in manuscript-master.Rnw. My example is boring and produces an ugly pdf, but it gets the point across. In a real example, I’d start with the LaTeX template provided by the journal or create one based on the journal’s manuscript preparation guidelines.

The first code chunk is standard stuff for knitr and beyond the scope of what I want to discuss here. The second chunk, intro, is where things get interesting. Since I am working on a Mac, I call textutil via system() to convert the current version of the introduction.docx file my collaborators have been editing into a .txt file. Then I use file.rename() to change the file from .txt to .tex. In the LaTeX portion of the document, I pull the .tex file into the manuscript via \input{sections/introduction}. I do this for all of the non-analysis sections, e.g., Introduction, Method, Discussion. Boom! All of my collaborators’ work is automatically inserted into the manuscript. No formatting required.

Sideshow

I do things a bit differently in the Results section. I want to keep the writing with the munging/analysis code in R. To keep the master typesetting file simple, I use a knitr command to call a separate analysis .Rnw file:
<<analysis, child='manuscript-analysis.Rnw', include=FALSE>>=

@

If you download manuscript-analysis.Rnw from my repo, you’ll see that this file contains all of the munging/analysis R script, the text of the results section, and LaTeX commands to create tables and figures.

Putting It All Together

Click “Compile PDF” in RStudio and be amazed. Colleagues want to make some edits in their Word sections? No problem. Just press compile again. New data? No problem. Just press compile again.

Screenshot 2014-01-27 23.04.34

Screenshot 2014-01-28 17.38.42

Tags: , , , ,