16 Function documentation

::: {.rmdnote} You are reading the work-in-progress second edition of R Packages. This chapter is undergoing heavy restructuring and may be confusing or incomplete. :::

16.1 Introduction

Documentation is one of the most important aspects of a good package: without it, users won’t know how to use your package! Documentation is also useful for future-you (so you remember what your functions were supposed to do) and for developers extending your package.

In this chapter, you’ll learn about function documentation, as accessed by ? or help(). Function documentation works like a dictionary: it’s helpful if you want to know what a function does, but it won’t help you find the right function for a new situation. That’s one of the jobs of vignettes, which you’ll learn about in the next chapter. In this chapter we’ll focus on documenting functions, but the same ideas apply to documenting datasets, classes and generics, and packages.

Base R provides a standard way of documenting the functions in a package: you write .Rd files in the man/ directory. These files use a custom syntax, loosely based on LaTeX, that are rendered to HTML, plain text, or pdf for viewing. We are not going to use these files directly. Instead, we’ll use the roxygen2 package to generate them from specially formatted comments. There are a few advantages to using roxygen2:

  • Code and documentation are intermingled so that when you modify your code, it’s easy to remember to also update your documentation.

  • You can write using markdown, rather than having to remember another text formatting syntax.

  • roxygen2 dynamically inspects the objects that it documents, so you can skip some boilerplate that you’d otherwise need to write by hand.

  • It abstracts provides a number of tools for sharing text between documentation topics and even between topics and vignettes.

You’ll see these files when you work with them in git, but you’ll otherwise rarely need to look at them.

16.2 roxygen2 basics

To get started, we’ll work through the basic roxygen2 workflow and discuss the overall structure of roxygen2 comments which are organised into blocks and tags.

16.2.1 The documentation workflow

The documentation workflow starts when you add roxygen comments, comments that start with ', to your source file. Here’s a simple example:

#' Add together two numbers
#' 
#' @param x A number.
#' @param y A number.
#' @return The sum of `x` and `y`.
#' @examples
#' add(1, 1)
#' add(10, 1)
add <- function(x, y) {
  x + y
}

Then you’ll press Ctrl/Cmd + Shift + D (or run devtools::document()) to run roxygen2::roxygenise() which generates a man/add.Rd that looks like this:

% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/across.R
\name{add}
\alias{add}
\title{Add together two numbers}
\usage{
add(x, y)
}
\arguments{
\item{x}{A number.}

\item{y}{A number.}
}
\value{
The sum of \code{x} and \code{y}.
}
\description{
Add together two numbers
}
\examples{
add(1, 1)
add(10, 1)
}

If you’re familiar with LaTeX, this should look familiar since the .Rd format is loosely based on it, and you can read more about it in R extensions. You’ll commit this file to git (since it’s what R requires to make its interactive documentation work), but typically you won’t otherwise need to look at it.

When you use ?add, help("add"), or example("add"), R looks for an .Rd file containing \alias{"add"}. It then parses the file, converts it into HTML and displays it. Here’s what the result looks like in RStudio:

To preview the development documentation, devtools uses some tricks to override the usual help functions so they know where to look in your source packages. To activate these tricks, you need to run devtools::load_all() once. So if development the documentation doesn’t appear, you may need to load your package first.

To summarize, there are four steps in the basic roxygen2 workflow:

  1. Add roxygen2 comments to your .R files.

  2. Run press Ctrl/Cmd + Shift + D or devtools::document() to convert roxygen comments to .Rd files.

  3. Preview documentation with ?.

  4. Rinse and repeat until the documentation looks the way you want.

16.2.2 roxygen2 comments, blocks, and tags

Now that you understand the basic workflow, lets talk a little more about roxygen2’s syntax. roxygen2 comments start with #' and the set of roxygen2 comments preceding a function is called a block. Blocks are broken up by tags, which look like @tagName tagValue. The content of a tag extends from the end of the tag name to the start of the next tag32. A block can contain text before the first tag which is called the introduction. By default, each roxygen2 block will generate a single documentation topic, corresponding to one .Rd. file in the man directory.

Throughout this chapter I’m going to show you roxygen2 comments from real tidyverse packages, focusing on stringr since the functions there tend to be fairly straightforward leading to documentation that easier to excerpt. Here’s a simple example to start with using the documentation for stringr::str_unique():

#' Remove duplicated strings
#'
#' `str_unique()` removes duplicated values, with optional control over
#' how duplication is measured.
#'
#' @param string A character vector to return unique entries.
#' @param ... Other options used to control matching behavior between duplicate
#'   strings. Passed on to [stringi::stri_opts_collator()].
#' @returns A character vector.
#' @seealso [unique()], [stringi::stri_unique()] which this function wraps.
#' @examples
#' str_unique(c("a", "b", "c", "b", "a"))
#'
#' # Use ... to pass additional arguments to stri_unique()
#' str_unique(c("motley", "mötley", "pinguino", "pingüino"))
#' str_unique(c("motley", "mötley", "pinguino", "pingüino"), strength = 1)
#' @export
str_unique <- function(string, ...) {
  ...
}

Here the introduction includes the title (“Remove duplicated strings”) and a basic description of what the function does. It’s followed by five tags, two @params, one @returns, one @seealso, one @example, and one @export. Note that I’ve wrapped each line of the roxygen2 block 80 characters wide, to match the wrapping of my code, and I’ve indented the second and subsequent lines of the long @param tag so it’s easier to scan. You can get more documentation style advice in the tidyverse style guide.

The following sections will work through the most important tags. We’ll start with the introduction which provides the title, description, and details, then we’ll cover the inputs (the function arguments), outputs (the return value), and examples.

16.3 Title, description, and details

The introduction provides the title, description, and, optionally, the details, of the function:

  • The title is taken from the first sentence. The title is shown in various function indexes so is what the user will see when browsing functions.

  • The description is taken from the next paragraph. It comes first in the documentation and should briefly describe the most important features of the function.

  • The details are taken from any additional text. Details are optional, but can be any length so are useful if want to dig deep into some important aspect of the function.

The following sections describe each component in more detail, and then discuss a few useful related tags.

16.3.1 Title

The title should be written in sentence case and not end in a full stop. When figuring out what to use as a title, I think it’s most important to consider the functions in your package holistically. When the user is skimming an index of functions, how will they know which function to use? What do functions have in common which doesn’t need to be repeated in every title? What is unique to that function and should be highlighted?

As an example, take the titles of some of the key dplyr functions:

  • mutate(): Create, modify, and delete columns.
  • summarise(): Summarise each group to fewer rows.
  • filter(): Subset rows using column values.
  • select(): Subset columns using their names and types.
  • arrange(): Arrange rows by column values.

Here we’ve tried to succinctly describe what the function does, making sure to describe whether it affects rows, columns, or groups. Where possible, we’ve tried to use synonyms of the function name in the title to hopefully give folks another chance to understand the intent of the function.

At the time we wrote this, I don’t think the function titles for stringr were that successful. But they provide a useful negative case study.

There’s a lot of repetition (“pattern”, “from a string”) and the verb used for the function name is repeated in title, so if you don’t understand the function already, the title seems unlikely to help much. (In hindsight, it also seems like the function names could have been better chosen.) Hopefully we’ll have improved those titles by the time you read this.

16.3.2 Description

The goal of the description is to summarize the goal of the function, usually in under a paragraph. This can be challenging to because the title of the function is also a very concise summary of the function. And it’s often especially hard if you’ve just written the function because the purpose seems so intuitively obvious it’s hard to understand why anyone would need an explanation.

It’s ok for the description to be a little duplicative of the rest of the documentation; it’s often useful for the reader to see the same thing expressed in two different ways. It’s a little extra work keeping it all up to date, but the extra effort is often worth it.

#' Detect the presence/absence of a pattern
#'
#' `str_detect()` returns a logical vector `TRUE` if `pattern` is found within
#' each element of `string` or a `FALSE` if not. It's equivalent
#' `grepl(pattern, string)`.

If you want to include multiple paragraphs of text or other organisations (like a bulleted list), you can use the explicit @description tag. Here’s an example from the documentation of stringr::str_like() which mimic’s the LIKE operator from SQL:

#' Detect the a pattern in the same way as `SQL`'s `LIKE` operator.
#'
#' @description
#' `str_like()` follows the conventions of the SQL `LIKE` operator:
#'
#' * Must match the entire string.
#' * `_` matches a single character (like `.`).
#' * `%` matches any number of characters (like `.*`).
#' * `\%` and `\_` match literal `%` and `_`.
#' * The match is case insensitive by default.

You can also use explicit @title and @details tags but we don’t recommend it as it adds extra noise to the docs without enabling any additional functionality.

16.3.3 Details

If you have a lot of information to convey in the details, I recommend using markdown headings to break up the documentation in to sections. Here’s a example from dplyr::mutate(). We’ve elided some of the details to keep this example short, but you should still get a sense of how we used headings to break up the content in to skimable chunks.

#' Create, modify, and delete columns
#'
#' `mutate()` adds new variables and preserves existing ones;
#' `transmute()` adds new variables and drops existing ones.
#' New variables overwrite existing variables of the same name.
#' Variables can be removed by setting their value to `NULL`.
#'
#' # Useful mutate functions
#'
#' * [`+`], [`-`], [log()], etc., for their usual mathematical meanings
#'
#' ...
#'
#' # Grouped tibbles
#'
#' Because mutating expressions are computed within groups, they may
#' yield different results on grouped tibbles. This will be the case
#' as soon as an aggregating, lagging, or ranking function is
#' involved. Compare this ungrouped mutate:
#'
#' ...

Note that even though these headings come immediately after the description they are shown much later (after the function arguments and return value) in the rendered documentation.

In older code, you might also see the use of @section title: which was used to create headings before roxygen2 fully supported RMarkdown. You can move these below the description and turn into markdown headings.

16.4 Arguments

For most functions, the bulk of your documentation effort will go towards documenting how each argument affects the output of the function. For this purpose, you’ll use the @param (short for parameter, a synonym of argument), which is always followed by the argument name and then a description of its action. The description is a sentence so it should start with a capital letter and end with a full stop.

The most important job of the description should provide a succinct summary of the allowed inputs and what the parameter does. For example, here’s stringr::str_detect():

#' @param string Input vector. Either a character vector, or something
#'  coercible to one.

If the argument has a default value, it’s a good idea to repeat it in the documentation because the function usage (which shows the default values) and the argument description are quite far apart in the docs. For example, here’s str_flatten():

#' @param collapse String to insert between each piece. Defaults to `""`.

If an argument has a fixed set of possible parameters, you should list them. If they’re simple, you can just list them in a sentence, like in str_trim():

#' @param side Side on which to remove whitespace: `"left"`, `"right"`, or
#'   `"both"` (the default).

If they need more explanation, you might use a bulleted list, as in str_wrap():

#' @param whitespace_only A boolean.
#'   * `TRUE` (the default): wrapping will only occur at whitespace.
#'   * `FALSE`: can break on any non-word character (e.g. `/`, `-`).

16.4.1 Multiple arguments

If multiple arguments are tightly coupled, you can document them together by separating the names with commas (with no spaces). For example, in stringr::str_equal() x and y are interchangeable, so they’re documented together:

#' @param x,y A pair of character vectors.

In str_sub() start and end define the range of characters to replace, and you can use just start if you pass in a two-column matrix. So it makes sense to document them together:

#' @param start,end Two integer vectors. `start` gives the position
#'   of the first character (defaults to first), `end` gives the position
#'   of the last (defaults to last character). Alternatively, pass a two-column
#'   matrix to `start`.
#'
#'   Negative values count backwards from the last character.

16.4.2 Inheriting arguments

You can inherit argument docs from another function using @inheritParams function_name. stringr uses @inheritParams extensively because many functions have string and pattern arguments. So str_detect() documents them in detail:

#' @param string Input vector. Either a character vector, or something
#'  coercible to one.
#' @param pattern Pattern to look for.
#'
#'   The default interpretation is a regular expression, as described
#'   `vignette("regular-expressions")`. Control options with [regex()].
#'
#'   Match a fixed string (i.e. by comparing only bytes), using
#'   [fixed()]. This is fast, but approximate. Generally,
#'   for matching human text, you'll want [coll()] which
#'   respects character matching rules for the specified locale.
#'
#'   Match character, word, line and sentence boundaries with
#'   [boundary()]. An empty pattern, "", is equivalent to
#'   `boundary("character")`.

Then the majority of the other stringr functions use @inheritParams str_detect to get a detailed argument description without having to copy and paste.

@inheritParams only inherits docs for arguments that aren’t already documented, allowing you to document some and inherit others. str_match() uses this to document its unusual pattern argument:

#' @inheritParams str_detect
#' @param pattern Unlike other stringr functions, `str_match()` only supports
#'   regular expressions, as described `vignette("regular-expressions")`. 
#'   The pattern should contain at least one capturing group.

The source can be a function in the current package, via @inheritParams function, or another package, via @inheritParams package::function.

16.4.3 Multiple functions in one file

By default, each function gets its own documentation topic, but if functions are closely related it often makes sense to combine them into one topic. For example, take str_length() and str_width() which provide two different ways of computing the size of a string. As you can see from the description, both functions are documented together, because this makes it easy to see how they differ:

#' The length/width of a string
#'
#' @description
#' `str_length()` returns the number of codepoints in a string. These are
#' the individual elements (which are often, but not always letters) that
#' can be extracted with [str_sub()].
#'
#' `str_width()` returns how much space the string will occupy when printed
#' in a fixed width font (i.e. when printed in the console).
#'
#' ...
str_length <- function(string) {
  ...
}

This works because str_width() uses @rdname str_length so that its documentation is included an in existing topic:

#' @rdname str_length
str_width <- function(string) {
  ...
}

There are two ways to use @rdname. You can add documentation to an existing function:

16.5 Return value

As important as the inputs to the function is the output from the function. The job of the @returns33 tag is to document the output. Here the goal is not to describe exactly how the values are computed (which the job of the description and details), but to roughly describe the overall “shape” of the output, i.e. what sort of object it is, and if appropriate its size. For example, if your function returns a vector you should say what type and its length, or if your function returns a data frame you should describe the names of the columns, the type of each column, and how many rows.

The return documentation for functions in strings are mostly pretty simple, they return a some type of vector the same length as some input. For example, take str_like():

#' @returns A logical vector the same length as `string`.

If your package has multiple related functions, it’s useful to consistently think about what makes them different. For example, dplyr functions take data frames as inputs and returns data frames as outputs. But the details of that transformation is differs so each function documents what happens to the rows, the columns, the groups, and any additional attributes. For example, here’s dplyr::filter():

#' @returns
#' An object of the same type as `.data`. The output has the following properties:
#'
#' * Rows are a subset of the input, but appear in the same order.
#' * Columns are not modified.
#' * The number of groups may be reduced (if `.preserve` is not `TRUE`).
#' * Data frame attributes are preserved.

It’s also appropriate to describe important warnings or errors that the user might see here. For example readr::read_csv():

#' @returns A [tibble()]. If there are parsing problems, a warning will alert you.
#'   You can retrieve the full details by calling [problems()] on your dataset.

For initial CRAN submission, all functions must document their a @return value. This is not required for subsequent submission, but it’s good practice. There’s currently no way to enforce this (we’re working on it) which is why you’ll notice some tidyverse functions lack documentation of their outputs.

16.6 Examples

Describing how a function works is useful, but showing how it works is often even better. That’s the purpose of the @examples tag, which uses executable R code to show what you can do with the function.

Use examples to show the basic operation of the function, and then to highlight any particularly important properties. str_detct() starts by showing a few simple variations and then highlights are property you might easily miss from reading the docs: as well as passing a vector of strings and one pattern, you can also pass one string and vector of patterns.

#' @examples
#' fruit <- c("apple", "banana", "pear", "pineapple")
#' str_detect(fruit, "a")
#' str_detect(fruit, "^a")
#' str_detect(fruit, "a$")
#' 
#' # Also vectorised over pattern
#' str_detect("aecfg", letters)

Try to stay focused on the most important features without getting into the weeds of every last edge case: if you make the examples too long, it becomes hard for the user to find the key application that they’re looking for.

Bear in mind that you want examples to execute relatively quickly so users can run them, and so that when you make a website for your package it doesn’t take ages to generate the documentation.

If submitting to CRAN, examples must run in under 10 minutes.

16.6.1 Execution

Examples are run in four common cases:

  • Interactively using the example() function.
  • R CMD check on a computer you control (e.g. your development machine and your CI/CD server).
  • R CMD check on a computer you don’t control (e.g. CRAN).
  • When building your pkgdown website

This means that you example code must run without error in all three cases. This means that the code must be self-contained, and only uses packages that are listed in the DESCRIPTION fields Imports and Suggests.

16.6.2 Things to avoid

There are a few constraints imposed by CRAN on examples because if a user runs the example interactively with example() you don’t want to mess up their current session. This means that you shouldn’t make changes to the global state, so:

  • Don’t change global options with options() and don’t mess with the working directory.
  • Don’t create create files in the current working directory. Instead write them to a temporary directory, and make sure to clean them up at the end of the example.
  • Don’t write to the clipboard.
  • Avoid depending on external resources that might occasionally fail.

16.6.3 Errors

So what can you do if you want to include code that causes an error for the purposes of teaching. There are two basic options:

  • You can wrap the code in try() so that the error is shown, but doesn’t stop execution of the error.
  • You can wrap the code \dontrun{}34 so it is never run by example().

16.6.4 Conditional execution

In other cases, you might want code to run only in specific scenarios. In the most common case, you don’t want to run code on CRAN because you’re doing something that is usually best avoided (see below) or your examples need other setup that CRAN won’t have. In this case you can use @examplesIf instead of @examples. The code in an @examplesIf block will only be executed if some condition is TRUE:

#' @examplesIf some_function()
#' some_other_function()
#' some_more_functions()

For example, googledrive uses @examplesIf in almost every function because the examples can only work if you have an active, authenticated, connection to googledrive as judged by googledrive::drive_has_token(). For example, here’s googledrive::drive_publish():

#' @examplesIf drive_has_token()
#' # Create a file to publish
#' file <- drive_example_remote("chicken_sheet") %>%
#'   drive_cp()
#'
#' # Publish file
#' file <- drive_publish(file)
#' file$published

For initial CRAN submission of your package, all functions must contain some runnable examples (i.e. there must be examples and they must not all be wrapped in \dontrun{}).

16.6.5 Intermixing examples and text

It’s also possible to show example code in the text with code blocks, either ```R if you just want to show some code or ```{r} if you want the code to be run.

16.6.6 Organisation

tidyr::chop() + tidyr::unchop():

#' @examples
#' # Chop ==============================================================
#' df <- tibble(x = c(1, 1, 1, 2, 2, 3), y = 1:6, z = 6:1)
#' # Note that we get one row of output for each unique combination of
#' # non-chopped variables
#' df %>% chop(c(y, z))
#' # cf nest
#' df %>% nest(data = c(y, z))
#'
#' # Unchop ============================================================
#' df <- tibble(x = 1:4, y = list(integer(), 1L, 1:2, 1:3))
#' df %>% unchop(y)
#' df %>% unchop(y, keep_empty = TRUE)
#' 
#' #' # Incompatible types -------------------------------------------------
#' # If the list-col contains types that can not be natively
#' df <- tibble(x = 1:2, y = list("1", 1:3))
#' try(df %>% unchop(y))

16.8 Re-using documentation

There is a tension between the DRY (don’t repeat yourself) principle of programming and the need for documentation to be self-contained. It’s frustrating to have to navigate through multiple help files in order to pull together all the pieces you need. roxygen2 provides a number of ways to avoid you having to repeat yourself as a developer, while not forcing the user to follow a spiderweb of links to find everything they need. Here we’ll focus on two:

There are two RMarkdown features supported by roxygen2 that you can use to share documentation.

  • You can use child documents to share Rmd between topics.
  • You can use inline R code to generate documentation.

16.8.1 @inherits

16.8.2 Child documents

Allows you to share documentation