Chapter 1 Introduction

In R, the fundamental unit of shareable code is the package. A package bundles together code, data, documentation, and tests, and is easy to share with others. As of June 2019, there were over 14,000 packages available on the Comprehensive R Archive Network, or CRAN, the public clearing house for R packages. This huge variety of packages is one of the reasons that R is so successful: the chances are that someone has already solved a problem that you’re working on, and you can benefit from their work by downloading their package.

If you’re reading this book, you already know how to use packages:

  • You install them from CRAN with install.packages("x").
  • You use them in R with library("x").
  • You get help on them with package?x and help(package = "x").

The goal of this book is to teach you how to develop packages so that you can write your own, not just use other people’s. Why write a package? One compelling reason is that you have code that you want to share with others. Bundling your code into a package makes it easy for other people to use it, because like you, they already know how to use packages. If your code is in a package, any R user can easily download it, install it and learn how to use it.

But packages are useful even if you never share your code. As Hilary Parker says in her introduction to packages: “Seriously, it doesn’t have to be about sharing your code (although that is an added benefit!). It is about saving yourself time.” Organising code in a package makes your life easier because packages come with conventions. For example, you put R code in R/, you put tests in tests/ and you put data in data/. These conventions are helpful because:

  • They save you time — you don’t need to think about the best way to organise a project, you can just follow a template.

  • Standardised conventions lead to standardised tools — if you buy into R’s package conventions, you get many tools for free.

It’s even possible to use packages to structure your data analyses, as described by Marwick, Boettiger, and Mullen in (Marwick, Boettiger, and Mullen 2018a) (Marwick, Boettiger, and Mullen 2018b).

1.1 Philosophy

This book espouses our philosophy of package development: anything that can be automated, should be automated. Do as little as possible by hand. Do as much as possible with functions. The goal is to spend your time thinking about what you want your package to do rather than thinking about the minutiae of package structure.

This philosophy is realised primarily through the devtools package, which is the public face for a suite of R functions that automate common development tasks. The release of version 2.0.0 in October 2018 marked its internal restructuring into a set of more focused packages, with devtools becoming more of a meta-package. The usethis package is the sub-package you are most likely to interact with directly; we explain the devtools-usethis relationship in section 1.4.

As always, the goal of devtools is to make package development as painless as possible. It encapsulates the best practices developed by first author Hadley Wickham, initially from years as a prolific solo developer. More recently, he has assembled a team of ~10 developers at RStudio, who collectively look after ~150 open source R packages, including those known as the tidyverse. The reach of this team allows us to explore the space of all possible mistakes at an extraordinary scale. Fortunately, it also affords us the opportunity to reflect on both the successes and failures, in the company of expert and sympathetic colleagues. We try to develop practices that make life more enjoyable for both the maintainer and users of a package. The devtools meta-package is where these lessons are made concrete.

Through the book, we highlight specific ways that RStudio can expedite your package development workflow, in specially formatted sections like this.

Devtools works hand-in-hand with RStudio, which we believe is the best development environment for most R users. The main alternative is Emacs Speaks Statistics (ESS), which is a rewarding environment if you’re willing to put in the time to learn Emacs and customise it to your needs. The history of ESS stretches back over 20 years (predating R!), but it’s still actively developed and many of the workflows described in this book are also available there. For those loyal to vim, we recommend the Nvim-R plugin.

Together, devtools and RStudio insulate you from the low-level details of how packages are built. As you start to develop more packages, we highly recommend that you learn more about those details. The best resource for the official details of package development is always the official writing R extensions manual. However, this manual can be hard to understand if you’re not already familiar with the basics of packages. It’s also exhaustive, covering every possible package component, rather than focussing on the most common and useful components, as this book does. Writing R extensions is a useful resource once you’ve mastered the basics and want to learn what’s going on under the hood.

1.2 In this book

Chapter 2 runs through the development of a small toy package. It’s meant to paint the Big Picture and suggest a workflow, before we descend into the detailed treatment of the key components of an R package.

The basic structure of a package is explained in chapter 3. Subsequent chapters of the book go into more details about each component. They’re roughly organised in order of importance:

  • R code, chapter 4: the most important directory is R/, where your R code lives. A package with just this directory is still a useful package. (And indeed, if you stop reading the book after this chapter, you’ll have still learned some useful new skills.)

  • Package metadata, chapter 5: the DESCRIPTION lets you describe what your package needs to work. If you’re sharing your package, you’ll also use the DESCRIPTION to describe what it does, who can use it (the license), and who to contact if things go wrong.

  • Documentation, chapter 6: if you want other people (including future-you!) to understand how to use the functions in your package, you’ll need to document them. We’ll show you how to use roxygen2 to document your functions. We recommend roxygen2 because it lets you write code and documentation together while continuing to produce R’s standard documentation format.

  • Vignettes, chapter 7: function documentation describes the nit-picky details of every function in your package. Vignettes give the big picture. They’re long-form documents that show how to combine multiple parts of your package to solve real problems. We’ll show you how to use Rmarkdown and knitr to create vignettes with a minimum of fuss.

  • Tests, chapter 8: to ensure your package works as designed (and continues to work as you make changes), it’s essential to write unit tests which define correct behaviour, and alert you when functions break. In this chapter, we’ll teach you how to use the testthat package to convert the informal interactive tests that you’re already doing to formal, automated tests.

  • Namespace, chapter 9: to play nicely with others, your package needs to define what functions it makes available to other packages and what functions it requires from other packages. This is the job of the NAMESPACE file and we’ll show you how to use roxygen2 to generate it for you. The NAMESPACE is one of the more challenging parts of developing an R package but it’s critical to master if you want your package to work reliably.

  • External data, chapter 10: the data/ directory allows you to include data with your package. You might do this to bundle data in a way that’s easy for R users to access, or just to provide compelling examples in your documentation.

  • Compiled code, chapter 11: R code is designed for human efficiency, not computer efficiency, so it’s useful to have a tool in your back pocket that allows you to write fast code. The src/ directory allows you to include speedy compiled C and C++ code to solve performance bottlenecks in your package.

  • Other components, chapter 13: this chapter documents the handful of other components that are rarely needed: demo/, exec/, po/ and tools/.

The final chapters describe general best practices not specifically tied to one directory:

  • Git and GitHub, chapter 14: mastering a version control system is vital to easily collaborate with others, and is useful even for solo work because it allows you to easily undo mistakes. In this chapter, you’ll learn how to use the popular Git and GitHub combo with RStudio.

  • Automated checking, chapter 15: R provides very useful automated quality checks in the form of R CMD check. Running them regularly is a great way to avoid many common mistakes. The results can sometimes be a bit cryptic, so we provide a comprehensive cheatsheet to help you convert warnings to actionable insight.

  • Release, chapter 16: the life-cycle of a package culminates with release to the public. This chapter compares the two main options (CRAN and GitHub) and offers general advice on managing the process.

This is a lot to learn, but don’t feel overwhelmed. Start with a minimal subset of useful features (e.g. just an R/ directory!) and build up over time. To paraphrase the Zen monk Shunryu Suzuki: “Each package is perfect the way it is — and it can use a little improvement”.

1.3 Prepare your system

To get started, make sure you have the latest version of R (at least 3.6.0, which is the version being used to render this book), then run the following code to get the packages you’ll need:

install.packages(c("devtools", "roxygen2", "testthat", "knitr"))

Make sure you have a recent version of the RStudio integrated development environment (IDE). In fact, consider using the preview version and updating regularly. Compared to the official released version, the preview gives you access to the latest and greatest features and only slightly increases your chances of finding a bug. It is distinct from the more volatile daily build.

1.4 devtools, usethis, and you

“I am large, I contain multitudes.”

— Walt Whitman, Song of Myself

After 7 years of development, devtools had grown into a rather unwieldy package, making maintenance difficult. Version 2.0.0, released in late 2018, marked the conscious uncoupling of devtools, with most functionality moving into seven smaller packages. Through various means, devtools continues to expose all its usual functionality, although it is mostly maintained elsewhere. For example, devtools might provide a wrapper function in order to set user-friendly defaults, introduce helpful interactive behaviour, or to combine functionality from multiple sub-packages.

What’s our recommended approach to devtools and its constituent packages? It varies, depending on whether you’re working in useR or developeR mode:

  • For interactive use, useRs should attach devtools and think of it as the provider of your favorite functions for package development.
  • For programmatic use, such as inside another package, developeRs should NOT depend on devtools, but should instead access functions via the package that is their primary home.
    • devtools should rarely appear in the role of foo in a qualified call of the form foo::fcn(). Instead, foo should be the package where fcn() is defined.
    • An exception to this is that we continue to feature devtools::install_github() as the way to install the development version of a package in its README, even though install_github() actually lives in the remotes package. That’s because this piece of advice pertains to interactive use, where we prefer to emphasize devtools.
  • Try to report bugs on the package that is a function’s primary home.

Example of how to simulate installing and loading a package, during interactive development:

library(devtools)
load_all()

If that same functionality is used inside an R package, this is the preferred call:

pkgload::load_all()

The usethis package is the one constituent package that more people may be aware of and that they may use directly. It now holds the functions that act on the files and folders in an R project, most especially for any project that is also an R package. All functions in usethis are made available by devtools. So, once you attach devtools, you can use any function in usethis without qualification, i.e. just call use_testthat(). If you choose to specify the namespace, such as when working in a more programmatic style, then access usethis functions directly: do usethis::use_testthat() instead of devtools::use_testthat().

1.4.1 Personal startup configuration

You can attach devtools like so:

library(devtools)

But it soon grows aggravating to repeatedly attach devtools in every R session. Therefore, we strongly recommend attaching devtools in your .Rprofile startup file, like so:

if (interactive()) {
  suppressMessages(require(devtools))
}

For convenience, the function use_devtools() creates .Rprofile, if needed, opens it for editing, and puts the necessary lines of code on the clipboard and the screen. Another package you may want to handle this way is testthat.

In general, it’s a bad idea to attach packages in .Rprofile, as it invites you to create R scripts that don’t reflect all of their dependencies via explicit calls to library(foo). But devtools is a workflow package that smooths the process of package development and is, therefore, unlikely to get baked into any analysis scripts. Note how we still take care to only attach in interactive sessions.

The following code installs the development versions of devtools and usethis, which may be important during the revision of the book.

devtools::install_github("r-lib/devtools")
devtools::install_github("r-lib/usethis")

1.5 R build toolchain

To be fully capable of building R packages from source, you’ll also need a compiler and a few other command line tools. This may not be strictly necessary until you want to build packages containing C or C++ code (the topic of chapter 11). Especially if you are using RStudio, you can set this aside for now. The IDE will alert you and provide support once you try to do something that requires you to setup your development environment. Read on for advice on doing this yourself.

1.5.1 Windows

On Windows the collection of tools needed for building packages from source is called Rtools.

Rtools is NOT an R package. It is NOT installed with install.packages(). Instead, download it from https://cran.r-project.org/bin/windows/Rtools/ and run the installer.

During the Rtools installation you may see a window asking you to “Select Additional Tasks”.

  • Do not select the box for “Edit the system PATH”. devtools and RStudio should put Rtools on the PATH automatically when it is needed.
  • Do select the box for “Save version information to registry”. It should be selected by default.

1.5.2 macOS

You need to install the Xcode command line tools, which requires that you register as an Apple developer (don’t worry, it’s free).

Then, in the shell, do:

xcode-select --install

Alternatively, you can install the current release of full Xcode from the Mac App Store. This includes a very great deal that you do not need, but it offers the advantage of App Store convenience.

1.5.3 Linux

Make sure you’ve installed not only R, but also the R development tools. For example, on Ubuntu (and Debian) you need to install the r-base-dev package.

1.5.4 Verify system prep

You can check that you have everything installed and working by running the following code:

# TODO: replace with whatever results from https://github.com/r-lib/devtools/issues/1970
library(devtools)
has_devel()
#> [1] TRUE

If everything is ok, it returns TRUE. Otherwise, it will reveal some diagnostic info about the problem.

1.6 Acknowledgments

TODO: when updating this, cover conscious uncoupling in version 2.0.0 and Jim Hester taking over as maintainer in February 2018.

The tools in this book wouldn’t be possible without many open source contributors. Winston Chang, my co-author on devtools, spent hours debugging painful S4 and compiled code problems so that devtools can quickly reload code for the vast majority of packages. Kirill Müller contributed great patches to many of my package development packages including devtools, testthat, and roxygen2. Kevin Ushey, JJ Allaire and Dirk Eddelbuettel tirelessly answered all my basic C, C++ and Rcpp questions. Peter Danenburg and Manuel Eugster wrote the first version of roxygen2 during a Google Summer of Code. Craig Citro wrote much of the code to allow travis to work with R packages.

Often the only way I learn how to do it the right way is by doing it the wrong way first. For suffering through many package development errors, I’d like to thank all the CRAN maintainers, especially Brian Ripley, Uwe Ligges and Kurt Hornik.

This book was written in the open and it is truly a community effort: many people read drafts, fixed typos, suggested improvements, and contributed content. Without those contributors, the book wouldn’t be nearly as good as it is, and I’m deeply grateful for their help. A special thanks goes to Peter Li, who read the book from cover-to-cover and provided many fixes. I also deeply appreciate the time the reviewers (Duncan Murdoch, Karthik Ram, Vitalie Spinu and Ramnath Vaidyanathan) spent reading the book and giving me thorough feedback.

Thanks go to all contributors who submitted improvements via github (in alphabetical order): @aaronwolen, @adessy, Adrien Todeschini, Andrea Cantieni, Andy Visser, @apomatix, Ben Bond-Lamberty, Ben Marwick, Brett K, Brett Klamer, @contravariant, Craig Citro, David Robinson, David Smith, @davidkane9, Dean Attali, Eduardo Ariño de la Rubia, Federico Marini, Gerhard Nachtmann, Gerrit-Jan Schutten, Hadley Wickham, Henrik Bengtsson, @heogden, Ian Gow, @jacobbien, Jennifer (Jenny) Bryan, Jim Hester, @jmarshallnz, Jo-Anne Tan, Joanna Zhao, Joe Cainey, John Blischak, @jowalski, Justin Alford, Karl Broman, Karthik Ram, Kevin Ushey, Kun Ren, @kwenzig, @kylelundstedt, @lancelote, Lech Madeyski, @lindbrook, @maiermarco, Manuel Reif, Michael Buckley, @MikeLeonard, Nick Carchedi, Oliver Keyes, Patrick Kimes, Paul Blischak, Peter Meissner, @PeterDee, Po Su, R. Mark Sharp, Richard M. Smith, @rmar073, @rmsharp, Robert Krzyzanowski, @ryanatanner, Sascha Holzhauer, @scharne, Sean Wilkinson, @SimonPBiggs, Stefan Widgren, Stephen Frank, Stephen Rushe, Tony Breyal, Tony Fischetti, @urmils, Vlad Petyuk, Winston Chang, @winterschlaefer, @wrathematics, @zhaoy.

The light bulb image used for workflow tips comes from www.vecteezy.com.

1.7 Conventions

Throughout this book, I write foo() to refer to functions, bar to refer to variables and function parameters, and baz/ to paths.

Larger code blocks intermingle input and output. Output is commented so that if you have an electronic version of the book, e.g., http://r-pkgs.had.co.nz, you can easily copy and paste examples into R. Output comments look like #> to distinguish them from regular comments.

1.8 Colophon

This book was written in Rmarkdown inside RStudio. knitr and pandoc converted the raw Rmarkdown to html and pdf. The website was made with jekyll, styled with bootstrap, and automatically published to Amazon’s S3 by travis-ci. The complete source is available from github.

This version of the book was built with:

library(roxygen2)
library(testthat)
devtools::session_info()
#> ─ Session info ──────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 3.6.0 (2017-01-27)
#>  os       Ubuntu 16.04.6 LTS          
#>  system   x86_64, linux-gnu           
#>  ui       X11                         
#>  language en_US.UTF-8                 
#>  collate  en_US.UTF-8                 
#>  ctype    en_US.UTF-8                 
#>  tz       UTC                         
#>  date     2019-06-07                  
#> 
#> ─ Packages ──────────────────────────────────────────────────────────────
#>  package     * version    date       lib source                         
#>  assertthat    0.2.1      2019-03-21 [1] CRAN (R 3.6.0)                 
#>  backports     1.1.4      2019-04-10 [1] CRAN (R 3.6.0)                 
#>  bookdown      0.11       2019-05-28 [1] CRAN (R 3.6.0)                 
#>  callr         3.2.0      2019-03-15 [1] CRAN (R 3.6.0)                 
#>  cli           1.1.0      2019-03-19 [1] CRAN (R 3.6.0)                 
#>  commonmark    1.7        2018-12-01 [1] CRAN (R 3.6.0)                 
#>  crayon        1.3.4      2017-09-16 [1] CRAN (R 3.6.0)                 
#>  desc          1.2.0      2018-05-01 [1] CRAN (R 3.6.0)                 
#>  devtools      2.0.2.9000 2019-06-06 [1] Github (r-lib/devtools@8c97dee)
#>  digest        0.6.19     2019-05-20 [1] CRAN (R 3.6.0)                 
#>  evaluate      0.14       2019-05-28 [1] CRAN (R 3.6.0)                 
#>  fs            1.3.1      2019-05-06 [1] CRAN (R 3.6.0)                 
#>  glue          1.3.1      2019-03-12 [1] CRAN (R 3.6.0)                 
#>  htmltools     0.3.6      2017-04-28 [1] CRAN (R 3.6.0)                 
#>  knitr         1.23       2019-05-18 [1] CRAN (R 3.6.0)                 
#>  magrittr      1.5        2014-11-22 [1] CRAN (R 3.6.0)                 
#>  memoise       1.1.0      2017-04-21 [1] CRAN (R 3.6.0)                 
#>  pkgbuild      1.0.3      2019-03-20 [1] CRAN (R 3.6.0)                 
#>  pkgload       1.0.2      2018-10-29 [1] CRAN (R 3.6.0)                 
#>  prettyunits   1.0.2      2015-07-13 [1] CRAN (R 3.6.0)                 
#>  processx      3.3.1      2019-05-08 [1] CRAN (R 3.6.0)                 
#>  ps            1.3.0      2018-12-21 [1] CRAN (R 3.6.0)                 
#>  purrr         0.3.2      2019-03-15 [1] CRAN (R 3.6.0)                 
#>  R6            2.4.0      2019-02-14 [1] CRAN (R 3.6.0)                 
#>  Rcpp          1.0.1      2019-03-17 [1] CRAN (R 3.6.0)                 
#>  remotes       2.0.4      2019-04-10 [1] CRAN (R 3.6.0)                 
#>  rlang         0.3.4      2019-04-07 [1] CRAN (R 3.6.0)                 
#>  rmarkdown     1.13       2019-05-22 [1] CRAN (R 3.6.0)                 
#>  roxygen2    * 6.1.1      2018-11-07 [1] CRAN (R 3.6.0)                 
#>  rprojroot     1.3-2      2018-01-03 [1] CRAN (R 3.6.0)                 
#>  sessioninfo   1.1.1      2018-11-05 [1] CRAN (R 3.6.0)                 
#>  stringi       1.4.3      2019-03-12 [1] CRAN (R 3.6.0)                 
#>  stringr       1.4.0      2019-02-10 [1] CRAN (R 3.6.0)                 
#>  testthat    * 2.1.1      2019-04-23 [1] CRAN (R 3.6.0)                 
#>  usethis       1.5.0      2019-04-07 [1] CRAN (R 3.6.0)                 
#>  withr         2.1.2      2018-03-15 [1] CRAN (R 3.6.0)                 
#>  xfun          0.7        2019-05-14 [1] CRAN (R 3.6.0)                 
#>  xml2          1.2.0      2018-01-24 [1] CRAN (R 3.6.0)                 
#>  yaml          2.2.0      2018-07-25 [1] CRAN (R 3.6.0)                 
#> 
#> [1] /home/travis/R/Library
#> [2] /usr/local/lib/R/site-library
#> [3] /home/travis/R-bin/lib/R/library

References

Marwick, Ben, Carl Boettiger, and Lincoln Mullen. 2018a. “Packaging Data Analytical Work Reproducibly Using R (and Friends).” The American Statistician 72 (1). Taylor & Francis:80–88. https://doi.org/10.1080/00031305.2017.1375986.

Marwick, Ben, Carl Boettiger, and Lincoln Mullen. 2018b. “Packaging Data Analytical Work Reproducibly Using R (and Friends).” PeerJ Preprints 6 (March):e3192v2. https://doi.org/10.7287/peerj.preprints.3192v2.