20  Software development practices

In this last part of the book, we zoom back out to consider development practices that can make you more productive and raise the quality of your work. Here we’ll discuss the use of version control and continuous integration. In Chapter 21 we discuss how the nature of package maintenance varies over the lifecycle of a package.

You will notice that we recommend using certain tools:

You might think that these pro-style tools are overkill for someone who doesn’t do software development for a living. While we don’t recommend forcing yourself to do everything above on day one of your first “hello world” project, we actually do believe these tools are broadly applicable for R package development.

The main reason is that these tools make it so much easier to do the right thing, e.g. to experiment, document, test, check, and collaborate. By adopting a shared toolkit, part-time and newer package developers gain access to the same workflows used by experts. This requires a certain amount of faith and upfront investment, but we believe this pays off.

20.1 Git and GitHub

Git is a version control system that was originally built to coordinate the work of a global group of developers working on the Linux kernel. Git manages the evolution of a set of files — called a repository — in a highly structured way and we recommend that every R package should also be a Git repository (and also, probably, an RStudio Project; Section 4.2).

A solo developer, working on a single computer, will benefit from adopting version control. But, for most of us, that benefit is not nearly large enough to make up for the pain of installing and using Git. In our opinion, for most folks, the pros of Git only outweigh the cons once you take the additional step of hooking your local repository up to a remote host like GitHub. The joint use of Git and GitHub offers many benefits that more than justify the learning curve.

20.1.1 Standard practice

This recommendation is well aligned with the current, general practices in software development. Here are a few relevant facts from the 2022 Stack Overflow developer survey, which is based on about 70K responses.

  • 94% report using Git. The second-most used version control system was SVN, used by 5% of respondents.

  • For personal projects, 87% of respondents report using GitHub, followed by GitLab (21%) and Bitbucket (11%). The ranking is the same albeit less skewed for professional work: GitHub still dominates with 56%, followed by GitLab (29%) and Bitbucket (18%).

We can even learn a bit about the habits of R package developers, based on the URLs found in the DESCRIPTION files of CRAN packages. As of March 2023, there are about 19K packages on CRAN, of which about 55% have a non-empty URL field (over 10K). Of those, 80% have a GitHub URL (over 8K), followed by GitLab (just over 1%) and Bitbucket (around 0.5%).

The prevalence of Git/GitHub, both within the R community and beyond, should help you feel confident that adoption will have tangible benefits. Furthermore, the sheer popularity of these tools means there are lots of resources available for learning how to use Git and GitHub and for getting unstuck1.

Two specific resources that address the intersection of Git/GitHub and the R world are the website Happy Git and GitHub for the useR and the article “Excuse me, do you have a moment to talk about version control?” (Bryan 2018).

We conclude this section with a few examples of why Git/GitHub can be valuable specifically for R package development:

  • Communication with users: GitHub Issues are well-suited for taking bug reports and feature requests. Unlike email sent to the maintainer, these conversations are accessible to others and searchable.

  • Collaboration: GitHub pull requests are a very low-friction way for outside contributors to help fix bugs and add features.

  • Distribution: Functions like devtools::install_github("r-lib/devtools") and pak::pak("r-lib/devtools") allow people to easily install the development version of your package, based on a source repository. More generally, anyone can install your package from any valid Git ref, such as a branch, specific SHA, pull request, or tag.

  • Website: GitHub Pages is one of the easiest ways to offer a website for your package (Section 19.2).

  • Continuous integration: This is actually the topic of the next section, so read on for more.

20.2 Continuous integration

As we said in the introduction, continuous integration and deployment is commonly abbreviated as CI/CD or just CI. For R package development, what this means in practice is:

  1. You host your source package on a platform like GitHub. The key point is that the hosted repository provides the formal structure for integrating the work of multiple contributors. Sometimes multiple developers have permission to push (this is how tidyverse and r-lib packages are managed). In other cases, only the primary maintainer has push permission. In either model, external contributors can propose changes via a pull request.

  2. You configure one or more development tasks to execute automatically when certain events happen in the hosted repository, such as a push or a pull request. For example, for an R package, it’s extremely valuable to configure an automatic run of R CMD check. This helps you discover breakage quickly, when it’s easier to diagnose and fix, and is a tremendous help for evaluating whether to accept an external contribution.

Overall, the use of hosted version control and continuous integration can make development move more smoothly and quickly.

Even for a solo developer, having R CMD check run remotely, possibly on a couple of different operating systems, is a mighty weapon against the dreaded “works on my machine” problem. Especially for packages destined for CRAN, the use of CI decreases the chance of nasty surprises right before release.

20.2.1 GitHub Actions

The easiest way to start using CI is to host your package on GitHub and use its companion service, GitHub Actions (GHA). Then you can use various functions from usethis to configure so-called GHA workflows. usethis copies workflow configuration files from r-lib/actions, which is where the tidyverse team maintains GHA infrastructure useful to the R community.

20.2.2 R CMD check via GHA

If you only use CI for one thing, it should be to run R CMD check. If you call usethis::use_github_action() with no arguments, you can choose from a few of the most useful workflows. Here’s what that menu looks like at the time of writing:

> use_github_action()
Which action do you want to add? (0 to exit)
(See <https://github.com/r-lib/actions/tree/v2/examples> for other options) 

1: check-standard: Run `R CMD check` on Linux, macOS, and Windows
2: test-coverage: Compute test coverage and report to https://about.codecov.io
3: pr-commands: Add /document and /style commands for pull requests


check-standard is highly recommended, especially for any package that is (or aspires to be) on CRAN. It runs R CMD check across a few combinations of operating system and R version. This increases your chances of quickly detecting code that relies on the idiosyncrasies of a specific platform, while it’s still easy to make the code more portable.

After making that selection, you will see some messages along these lines:

#> ✔ Creating '.github/'
#> ✔ Adding '*.html' to '.github/.gitignore'
#> ✔ Creating '.github/workflows/'
#> ✔ Saving 'r-lib/actions/examples/check-standard.yaml@v2' to .github/workflows/R-CMD-check.yaml'
#> • Learn more at <https://github.com/r-lib/actions/blob/v2/examples/README.md>.
#> ✔ Adding R-CMD-check badge to 'README.md'

The key things that happen here are:

  • A new GHA workflow file is written to .github/workflows/R-CMD-check.yaml. GHA workflows are specified via YAML files. The message reveals the source of the YAML and gives a link to learn more.

  • Some helpful additions may be made to various “ignore” files.

  • A badge reporting the R CMD check result is added to your README, if it has been created with usethis and has an identifiable badge “parking area”. Otherwise, you’ll be given some text you can copy and paste.

Commit these file changes and push to GitHub. If you visit the “Actions” section of your repository, you should see that a GHA workflow run has been launched. In due course, its success (or failure) will be reported there, in your README badge, and in your GitHub notifications (depending on your personal settings).

Congratulations! Your package will now benefit from even more regular checks.

20.2.3 Other uses for GHA

As suggested by the interactive menu, usethis::use_github_action() gives you access to pre-made workflows other than R CMD check. In addition to the featured choices, you can use it to configure any of the example workflows in r-lib/actions by passing the workflow’s name. For example:

  • use_github_action("test-coverage") configures a workflow to track the test coverage of your package, as described in Section 14.1.1.

Since GHA allows you to run arbitrary code, there are many other things that you can use it for:

  • Building your package’s website and deploying the rendered site to GitHub Pages, as described in Section 19.2. See also ?usethis::use_pkgdown_github_pages().

  • Re-publishing a book website every time you make a change to the source. (Like we do for this book!).

If the example workflows don’t cover your exact use case, you can also develop your own workflow. Even in this case, the example workflows are often useful as inspiration. The r-lib/actions repository also contains important lower-level building blocks, such as actions to install R or to install all of the dependencies indicated in a DESCRIPTION file.

  1. We feature GitHub here, for hosted version control, because it’s what we use and what has the best support in devtools. However, all the big-picture principles and even some details hold up for alternative platforms, such as Gitlab and Bitbucket.↩︎