R-packages: Organize, Test, Document, and Share Your Code by by Hadley Wickham and Jennifer Bryan (2023, O'Reilly)

Welcome to R Packages by Hadley Wickham and Jennifer Bryan. Packages are the fundamental units of reproducible R code. They include reusable R functions, the documentation that describes how to use them, and sample data. In this book you’ll learn how to turn your code into packages that others can easily download and use. Writing a package can seem overwhelming at first, so start with the basics and improve it over time. It doesn’t matter if your first version isn’t perfect as long as the next version is better.

cover

Table Of Contents

Changes

If you’re familiar with the first edition of the book, this preface describes the major changes so that you can focus your reading on the new areas. There are several main goals for this edition:

  • Update to reflect changes in the devtools package, specifically, its “conscious uncoupling” into a set of smaller, more focused packages.
  • Expand coverage of workflow and process, alongside the presentation of all the important moving parts that make up an R package.
  • Cover entirely new topics, such as package websites and GitHub Actions (GHA).

All content has been completely revised and updated. Many chapters are new or reorganized and a couple have been removed:

  • New Chapter 1, “The Whole Game” previews the entire package development process.
  • New Chapter 2, “System Setup” has been carved out of the previous Introduction and gained more detail.
  • The chapter formerly known as “Package Structure” has been expanded and split into two chapters, one covering package structure and state (Chapter 3) and another on workflows and tooling (Chapter 4).
  • New Chapter 5, “The Package Within” demonstrates how to extract reusable logic out of data analysis scripts and into a package.
  • The sections “Organizing Your Functions” and “Code Style,” from Chapter 6, “R Code” have been removed, in favor of an online style guide. The style guide is paired with the new styler package,1 which can automatically apply many of the rules.
  • The coverage of testing has expanded into three chapters: Chapter 13 for testing basics, Chapter 14 for test suite design, and Chapter 15 for various advanced topics.
  • Material around the NAMESPACE file and dependency relationships has been re-organized into two chapters: Chapter 10 provides technical context for thinking about dependencies, and Chapter 11 gives practice instructions for using different types of dependencies in different settings.
  • New Chapter 12, “Licensing” expands earlier content on licensing into its own chapter.
  • The chapter on C/C++ has been removed. It didn’t have quite enough information to be useful, and since the first edition of the book, other materials have arisen that are better learning resources. The chapter on Git/GitHub has been reframed around the more general topic of software development practices (Chapter 20). This no longer includes step-by-step instructions for basic tasks. The use of Git/GitHub has exploded since the first edition, accompanied by an explosion of learning resources, both general and specific to R (e.g., the website Happy Git and GitHub for the useR). Git/GitHub still feature prominently throughout the book, most especially in Chapter 20.
  • The very short inst chapter has been combined into Chapter 8, with all the other directories that can be important in specific contexts, but that aren’t mission critical to all packages.

Introduction

In R, the fundamental unit of shareable code is the package. A package bundles together code, data, documentation, and tests and is easy to share with others. As of March 2023, there were over 19,000 packages available on the Comprehensive R Archive Network, or CRAN, the public clearinghouse for R packages. This huge variety of packages is one of the reasons that R is so successful: the chances are that someone has already solved a problem you’re working on, and you can benefit from their work by downloading their package. If you’re reading this book, you already know how to work with packages in the following ways:

  • You install them from CRAN with install.packages(“x”).
  • You use them in R with library(“x”) or library(x).
  • You get help on them with package?x and help(package = “x”).

The goal of this book is to teach you how to develop packages so that you can write your own, not just use other people’s. Why write a package? One compelling reason is that you have code that you want to share with others. Bundling your code into a package makes it easy for other people to use it, because like you, they already know how to use packages. If your code is in a package, any R user can easily download it, install it, and learn how to use it.

But packages are useful even if you never share your code. As Hilary Parker says in her introduction to packages: “Seriously, it doesn’t have to be about sharing your code (although that is an added benefit!). It is about saving yourself time.” Organizing code in a package makes your life easier because packages come with conventions. For example, you put R code in R/, you put tests in tests/, and you put data in data/. These conventions are helpful because:

  • They save time—you don’t need to think about the best way to organize a project, you can just follow a template.
  • Standardized conventions lead to standardized tools—if you buy into R’s package conventions, you get many tools for free.

It’s even possible to use packages to structure your data analyses (e.g., “Packaging Data Analytical Work Reproducibly Using r (and Friends)” in The American Statistician or PeerJ Preprints),2 although we won’t delve deeply into that use case here.

Philosophy

This book espouses our philosophy of package development: anything that can be automated, should be automated. Do as little as possible by hand. Do as much as possible with functions. The goal is to spend your time thinking about what you want your package to do rather than thinking about the minutiae of package structure.

This philosophy is realized primarily through the devtools package, which is the public face for a suite of R functions that automate common development tasks. The release of version 2.0.0 in October 2018 marked its internal restructuring into a set of more focused packages, with devtools becoming more of a metapackage. The usethis package is the subpackage you are most likely to interact with directly; we explain the devtools-usethis relationship in “devtools, usethis, and You”.

As always, the goal of devtools is to make package development as painless as possible. It encapsulates the best practices developed by Hadley Wickham, initially from his years as a prolific solo developer. More recently, he has assembled a team of developers at Posit (formerly known as RStudio), who collectively look after hundreds of open source R packages, including those known as the tidyverse. The reach of this team allows us to explore the space of all possible mistakes at an extraordinary scale.

Fortunately, it also affords us the opportunity to reflect on both the successes and failures, in the company of expert and sympathetic colleagues. We try to develop practices that make life more enjoyable for both the maintainer and users of a package. The devtools metapackage is where these lessons are made concrete.

devtools works hand-in-hand with RStudio, which we believe is the best development environment for most R users. The most popular alternative to RStudio is currently Visual Studio Code (VS Code) with the R extension enabled. This can be a rewarding and powerful environment; however, it does require a bit more work to set up and customize.

Together, devtools and RStudio insulate you from the low-level details of how packages are built. As you start to develop more packages, we highly recommend that you learn more about those details. The best resource for the official details of package development is always the official Writing R Extensions manual.4 However, this manual can be hard to understand if you’re not already familiar with the basics of packages. It’s also exhaustive, covering every possible package component, rather than focusing on the most common and useful components, as this book does. Writing R Extensions is a useful resource once you’ve mastered the basics and want to learn what’s going on under the hood.

In this book

The first part of the book is all about giving you the tools you need to start your package development journey, and we highly recommend that you read it in order. We begin in Chapter 1 with a run-through of the complete development of a small package. It’s meant to paint the big picture and suggest a workflow, before we descend into the detailed treatment of the key components of an R package. Then in Chapter 2 you’ll learn how to prepare your system for package development, and in Chapter 3 you’ll learn the basic structure of a package and how that varies across different states. Next, in Chapter 4, we’ll cover the core workflows that come up repeatedly for package developers. The first part of the book ends with another case study (Chapter 5), this time focusing on how you might convert a script to a package and discussing the challenges you’ll face along the way.

The remainder of the book is designed to be read as needed. Pick and choose between the chapters as the various topics come up in your development process.

First we cover key package components: Chapter 6 discusses where your code lives and how to organize it, Chapter 7 shows you how to include data in your package, and Chapter 8 covers a few less important files and directories that need to be discussed somewhere.

Next we’ll dive into the package metadata, starting with DESCRIPTION in Chapter 9. We’ll then go deep into dependencies. In Chapter 10, we’ll cover the costs and benefits of taking on dependencies and provide some technical background on package namespaces and the search path. In Chapter 11, we focus on practical matters, such as how to use different types of dependencies in different parts of your package. This is also where we discuss exporting functions, which is what makes it possible for other packages and projects to depend on your package. We’ll finish off this part with a look at licensing in Chapter 12.

To ensure your package works as designed (and continues to work as you make changes), it’s essential to test your code, so the next three chapters cover the art and science of testing. Chapter 13 gets you started with the basics of testing with the testthat package. Chapter 14 teaches you how to design and organize tests in the most effective way. Then we finish off our coverage of testing in Chapter 15, which teaches you advanced skills to tackle challenging situations.

If you want other people (including future-you!) to understand how to use the functions in your package, you’ll need to document them. Chapter 16 gets you started using roxygen2 to document the functions in your package. Function documentation is helpful only if you know what function to look up, so next in Chapter 17 we’ll discuss vignettes, which help you document the package as a whole. We’ll finish up documentation with a discussion of other important markdown files like README.md and NEWS.md in Chapter 18, and creating a package website with pkgdown in Chapter 19.

The book concludes by zooming back out to consider development practices, such as the benefit of using version control and continuous integration (Chapter 20). We wrap things up by discussing the lifecycle (Chapter 21) of a package, including releasing it on CRAN (Chapter 22).

This is a lot to learn, but don’t feel overwhelmed. Start with a minimal subset of useful features (e.g., just an R/ directory!) and build up over time. To paraphrase the Zen monk Shunryu Suzuki: “Each package is perfect the way it is—and it can use a little improvement.”

What’s Not Here

There are also specific practices that have little to no treatment here simply because we do not use them enough to have any special insight. Does this mean that we actively discourage those practices? Probably not, as we try to be explicit about practices we think you should avoid. So if something is not covered here, it just means that a couple hundred heavily used R packages are built without meaningful reliance on that technique. That observation should motivate you to evaluate how likely it is that your development requirements truly don’t overlap with ours. But sometimes the answer is a clear “yes,” in which case you’ll simply need to consult another resource.

Download

ebook


See also