21  Software development practices

Second edition

You are reading the work-in-progress second edition of R Packages. This chapter should be readable but is currently undergoing final polishing.

21.1 Introduction

In this last part of the book, we zoom back out to consider development practices that can make you more productive and raise the quality of your work. Here we’ll discuss the use of version control and continuous integration. In Chapter 22 we discuss how the nature of package maintenance varies over the lifecycle of a package.

You will notice that we recommend using certain tools:

  • An integrated development environment (IDE). In Section 5.3 we encouraged the use of the RStudio IDE for package development work. That’s what we document, since it’s what we use and devtools is developed to work especially well with RStudio. But even if it’s not RStudio, we strongly recommend working with an IDE that has specific support for R and R package development.

  • Version control. We strongly recommend the use of formal version control and, at this point in time, Git is the obvious choice. We say that based on Git’s general prevalence and, specifically, its popularity within the R package ecosystem. In Section 21.2, we explain why we think version control is so important.

  • Hosted version control. We strongly recommend syncing your local Git repositories to a hosted service and, at this point in time, GitHub is the or, at least, “an” obvious choice. This is also covered in Section 21.2.

  • Continuous integration and deployment, a.k.a. CI/CD (or even just CI). This terminology comes from the general software engineering world and can sound somewhat grandiose or intimidating when applied to your personal R package. All this really means is that you set up specific package development tasks to happen automatically when you push new work to your hosted repository. Typically you’ll want to run R CMD check and to re-build and deploy your package website. In Section 21.3, we show how to do this with GitHub Actions.

You might think that these pro-style tools are overkill for someone who doesn’t do software development for a living. While we don’t recommend forcing yourself to do everything above on day one of your first “hello world” project, we actually do believe these tools are broadly applicable for R package development.

The main reason is that these tools make it so much easier to do the right thing, e.g. to experiment, document, test, check, and collaborate. By adopting a shared toolkit, part-time and newer package developers gain access to the same workflows used by experts. This requires a certain amount of faith and upfront investment, but we believe this pays off.

21.2 Git and GitHub

Git is a version control system that was originally built to coordinate the work of a global group of developers working on the Linux kernel. Git manages the evolution of a set of files — called a repository — in a highly structured way and we recommend that every R package should also be a Git repository (and also, probably, an RStudio Project; Section 5.3).

A solo developer, working on a single computer, will benefit from adopting version control. But, for most of us, that benefit is not nearly large enough to make up for the pain of installing and using Git. In our opinion, for most folks, the pros of Git only outweigh the cons once you take the additional step of hooking your local repository up to a remote host like GitHub. The joint use of Git and GitHub offers many benefits that more than justify the learning curve.

21.2.1 Standard practice

This recommendation is well aligned with the current, general practices in software development. Here are a few relevant facts from the 2022 Stack Overflow developer survey, which is based on about 70K responses.

  • 94% report using Git. The second-most used version control system was SVN, used by 5% of respondents.

  • For personal projects, 87% of respondents report using GitHub, followed by GitLab (21%) and Bitbucket (11%). The ranking is the same albeit less skewed for professional work: GitHub still dominates with 56%, followed by GitLab (29%) and Bitbucket (18%).

We can even learn a bit about the habits of R package developers, based on the URLs found in the DESCRIPTION files of CRAN packages. As of December 2022, there are about 18K packages on CRAN, of which about 55% have a non-empty URL field (over 10K). Of those, 80% have a GitHub URL (over 8K), followed by GitLab (just over 1%) and Bitbucket (around 0.5%).

The prevalence of Git/GitHub, both within the R community and beyond, should help you feel confident that adoption will have tangible benefits. Furthermore, the sheer popularity of these tools means there are lots of resources available for learning how to use Git and GitHub and for getting unstuck1.

Two specific resources that address the intersection of Git/GitHub and the R world are the website Happy Git and GitHub for the useR and the article “Excuse me, do you have a moment to talk about version control?” (Bryan 2018).

We conclude this section with a few examples of why Git/GitHub can be valuable specifically for R package development:

  • Communication with users: GitHub Issues are well-suited for taking bug reports and feature requests. Unlike email sent to the maintainer, these conversations are accessible to others and searchable.

  • Collaboration: GitHub pull requests are a very low-friction way for outside contributors to help fix bugs and add features.

  • Distribution: Functions like devtools::install_github("r-lib/devtools") and pak::pak("r-lib/devtools") allow people to easily install the development version of your package, based on a source repository. More generally, anyone can install your package from any valid Git ref, such as a branch, specific SHA, pull request, or tag.

  • Website: GitHub Pages is one of the easiest ways to offer a website for your package (Section 20.3).

  • Continuous integration: This is actually the topic of the next section, so read on for more.

21.3 Continuous integration

As we said in the introduction, continuous integration and deployment is commonly abbreviated as CI/CD or just CI. For R package development, what this means in practice is:

  1. You host your source package on a platform like GitHub. The key point is that the hosted repository provides the formal structure for integrating the work of multiple contributors. Sometimes multiple developers have permission to push (this is how tidyverse and r-lib packages are managed). In other cases, only the primary maintainer has push permission. In either model, external contributors can propose changes via a pull request.

  2. You configure one or more development tasks to execute automatically when certain events happen in the hosted repository, such as a push or a pull request. For example, for an R package, it’s extremely valuable to configure an automatic run of R CMD check. This helps you discover breakage quickly, when it’s easier to diagnose and fix, and is a tremendous help for evaluating whether to accept an external contribution.

Overall, the use of hosted version control and continuous integration can make development move more smoothly and quickly.

Even for a solo developer, having R CMD check run remotely, possibly on a couple of different operating systems, is a mighty weapon against the dreaded “works on my machine” problem. Especially for packages destined for CRAN, the use of CI decreases the chance of nasty surprises right before release.

21.3.1 GitHub Actions

The easiest way to start using CI is to host your package on GitHub and use its companion service, GitHub Actions (GHA). Then you can use various functions from usethis to configure so-called GHA workflows. usethis copies workflow configuration files from r-lib/actions, which is where the tidyverse team maintains GHA infrastructure useful to the R community.

21.3.2 R CMD check via GHA

If you only use CI for one thing, it should be to run R CMD check. If your package has a very limited audience, this will configure an entry-level solution:

# for a package with a limited audience

However, most packages will benefit from more rigorous checks, i.e. checking on Linux, macOS, and Windows. Here’s how to configure that:

You will see some messages along these lines:

#> ✔ Creating '.github/'
#> ✔ Adding '*.html' to '.github/.gitignore'
#> ✔ Creating '.github/workflows/'
#> ✔ Saving 'r-lib/actions/examples/check-standard.yaml@v2' to .github/workflows/R-CMD-check.yaml'
#> • Learn more at <https://github.com/r-lib/actions/blob/v2/examples/README.md>.
#> ✔ Adding R-CMD-check badge to 'README.md'

The key things that happen here are:

  • A new GHA workflow file is written below .github/workflows/. GHA workflows are specified via YAML files. The message reveals the source of the YAML and gives a link to learn more.

  • Some helpful additions may be made to various “ignore” files.

  • A badge reporting the R CMD check result is added to your README, if it has been created with usethis and has an identifiable badge “parking area”. Otherwise, you’ll be given some text you can copy and paste.

Commit these file changes and push to GitHub. If you visit the “Actions” section of your repository, you should see that a GHA workflow run has been launched. In due course, its success (or failure) will be reported there, in your README badge, and in your GitHub notifications (depending on your personal settings).

Congratulations! Your package will now benefit from even more regular checks.

21.3.3 Other uses for GHA

Since GHA allows you to run arbitrary code, there are many other things that you can use it for:

  • Building your package’s website and deploying the rendered site to GitHub Pages, as described in Section 20.3. See also ?usethis::use_pkgdown_github_pages().

  • Tracking the test coverage of your package, as described in Section 15.1.1. This can be set up with use_github_action("test-coverage").

  • Re-publishing a book website every time you make a change to the source. (Like we do for this book!).

Remember that r-lib/actions is a great place to learn more about using GHA for various R-related tasks. In addition to the pre-configured examples featured here, this repository also contains the lower-level building blocks that you could use to be build your own custom workflows, such as actions to install R or to install all of the dependencies indicated in a DESCRIPTION file.

  1. We feature GitHub here, for hosted version control, because it’s what we use and what has the best support in devtools. However, all the big-picture principles and even some details hold up for alternative platforms, such as Gitlab and Bitbucket.↩︎