When you don’t want to write dozens of versions of unit tests.
I had never heard of property-based testing until a few months ago when I started looking at some pull requests in polars (the Python implementation, not the R one) where they use hypothesis, for example polars#17992.
I have contributed to a fair amount of R packages but I have never seen this type of tests before: unit tests, plenty; snapshot tests, sometimes; but property-based tests? Never. And at first, I didn’t really see the point, but I’ve had a couple of situations recently where I thought it could help, so the aim of this post is to explain (briefly) what property-based testing is and to provide some examples where it can be useful.
Most of the time, unit tests check that the function performs well on a few different inputs: does it give correct results? Nice error messages? What about this corner case?
Property-based testing is a way of testing where we give random inputs to the function we want to test and we want to ensure that no matter the inputs, the output will respect some properties. For example, suppose we made a function to reverse the input, so if I pass 3, 1, 4
, it should return 4, 1, 3
1. We can pass several inputs and see if the output is correctly reversed. But a more efficient way would be to check that our function respects a basic property, which is that reversing the input twice should return the original input:
Therefore, property-based testing doesn’t use hardcoded values to check the output but ensures that our function respects a list of properties.
To the best of my knowledge, there are two R packages to do property-based testing in R: hedgehog
and quickcheck
(which is based on hedgehog
). If you already use testthat
for testing, then integrating them in the test suite is not hard. Using the example above, we could do:
library(quickcheck)
library(testthat)
test_that("reversing twice returns the original input", {
for_all(
a = numeric_(any_na = TRUE),
property = function(a) {
expect_equal(rev(rev(a)), a)
}
)
})
Test passed 😀
This example generated 100 random inputs and checked that the expect_equal()
clause was respected for all of them. We can see that by adding a print()
call (I reduce the number of tests to 5 to avoid too much clutter).
test_that("reversing twice returns the original input", {
for_all(
a = numeric_(any_na = TRUE),
tests = 5,
property = function(a) {
print(a)
expect_equal(rev(rev(a)), a)
}
)
})
[1] 9474 1816 5096 -5492 6089 0 7687 NA -1505
[1] -1906 -9064 47 9454 -5367 4836 NA
[1] 883085158 NA
[1] NA 0 5771 3808 -5308 -3313 -6441 2668 NA
[1] 0 0 -36502851 316004917 -237072955
Test passed 🥇
As we can see, a lot of different examples were generated: some have single values while other have multiple, some only have negative values while others have a mix, etc.
The example above checked only on numeric inputs, but we could check on any type of vector using any_atomic()
:
test_that("reversing twice returns the original input", {
for_all(
a = any_atomic(any_na = TRUE),
tests = 5,
property = function(a) {
print(a)
expect_equal(rev(rev(a)), a)
}
)
})
[1] -52659396 NA 134735066
[1] NA -595851358 NA 716078985
[1] NA 774043668 NA -540014936 NA 109811027
[1] NA NA -699860696 -112767336 -718748598 378867662
[7] 0 NA 0 0
[1] "541L1(9+3" "R(H" "=u" "=Q]Zx&" NA
[6] "Q" NA "o5" "[^+zm$\"" "#;+y\"U@;|"
Test passed 🎊
Finally, if a particular input fails, quickcheck
will first try to reduce the size of this input as much as possible (a process called “shrinking”). To illustrate that, let’s say we make a function to normalize a numeric vector to a [0, 1] interval:
[1] 0.5000000 1.0000000 0.6666667 0.0000000
One property of this function is that all output values should be in the interval [0, 1]. Does this function pass property-based tests?
test_that("output is in interval [0, 1]", {
for_all(
a = numeric_(any_na = TRUE),
tests = 5,
property = function(a) {
<- normalize(a)
res expect_true(all(res >= 0 & res <= 1))
}
)
})
: output is in interval [0, 1] ───────────────────────────────────────
── Failure1 tests, and 3 shrinks
Falsifiable after <expectation_failure/expectation/error/condition>
all(res >= 0 & res <= 1) is not TRUE
`actual`: <NA>
`expected`: TRUE
:
Backtrace
▆1. └─quickcheck::for_all(...)
[TRUNCATED...]:
Counterexample$a
1] -4037
[
:
Backtrace
▆1. └─quickcheck::for_all(...)
2. └─hedgehog::forall(...)
Hah-ah! Problem: what happens if the input is a single value? Then max(x) - min(x)
is 0, so the division gives NaN
. In the error message, we can see:
Falsifiable after 1 tests, and 3 shrinks
Shrinking is the action of reducing as much as possible the size of the input that makes the function fail. Having the smallest example possible is extremely useful when debugging.
Let’s fix the function and try again:
<- function(x) {
normalize if (length(x) == 1) {
return(0.5) # WARNING: this is for the sake of example, I don't
# guarantee this is the correct behavior
}- min(x)) / (max(x) - min(x))
(x
}
test_that("output is in interval [0, 1]", {
for_all(
a = numeric_(any_na = TRUE),
tests = 5,
property = function(a) {
<- normalize(a)
res expect_true(all(res >= 0 & res <= 1))
}
)
})
: output is in interval [0, 1] ───────────────────────────────────────
── Failure1 tests, and 8 shrinks
Falsifiable after <expectation_failure/expectation/error/condition>
all(res >= 0 & res <= 1) is not TRUE
`actual`: <NA>
`expected`: TRUE
:
Backtrace
▆1. └─quickcheck::for_all(...)
[TRUNCATED...]:
Counterexample$a
1] -2413 -2413
[
:
Backtrace
▆1. └─quickcheck::for_all(...)
2. └─hedgehog::forall(...)
Dang it, now it fails when I pass a two identical values! This is for the same reason as above, max(x) - min(x)
will return 0, but I won’t spend more time on this example, you get the idea.
Besides this basic example, where could this be useful?
When working with compiled code (C++, Rust, etc.), it can happen that a bug makes the R session crash (== segfault == “bomb icon” in RStudio). This can be extremely annoying as we lose all data and computations that were stored in memory. When we work with compiled code, there’s one property that our code should follow:
Calling a function should never lead to a segfault.
This happened to me a few months ago. I investigated some code that used igraph::cluster_fast_greedy()
. I know almost nothing about igraph
, I was just playing around with arguments, and suddenly… crash. I reported this situation (igraph#2459), which was promptly fixed (thank you igraph
devs!), but one sentence in the explanation caught my eye: “it is a rare use case to only want modularity but not membership, and avoiding membership calculation doesn’t have any advantages.”
I have no particular problem with this sentence or the rationale behind, it makes sense to prioritize fixes that affect a larger audience. But it got me thinking: could we try all combinations of inputs to see if it makes the session crash? We could use parametric testing for this, but then again we need to hardcode at least some possible values for parameters. We could say that we start by testing only TRUE
/FALSE
values for all combinations of params, but what if the user passes a string?
I think this is a situation where property-based testing would be helpful: we know that no matter the input type, its length, and the value of other inputs, the session shouldn’t crash. Implementing it with quickcheck
looks fairly simple:
library(igraph, warn.conflicts = FALSE)
test_that("cluster_fast_greedy doesn't crash", {
# setup a graph, from the examples of ?cluster_fast_greedy
g <- make_full_graph(5) %du% make_full_graph(5) %du% make_full_graph(5)
g <- add_edges(g, c(1, 6, 1, 11, 6, 11))
for_all(
merges = any_atomic(any_na = TRUE),
modularity = any_atomic(any_na = TRUE),
membership = any_atomic(any_na = TRUE),
weights = any_atomic(any_na = TRUE),
property = function(merges, modularity, membership, weights) {
suppressWarnings(
try(
cluster_fast_greedy(g, merges = merges, modularity = modularity, membership = membership, weights = weights),
silent = TRUE
)
)
expect_true(TRUE)
}
)
})
Test passed 😀
I didn’t really know what expectation to put, I don’t care if the function errors or not, I just want it not to segfault. So I put try(silent = TRUE)
and added a fake expectation.
I have spent some time working on tidypolars
, a package that provides the same interface as the tidyverse
but uses polars
under the hood. This means that there should be the lowest amount of “surprises” for the user: the behavior of functions that are available in tidypolars
should match the behavior of those in tidyverse
. Once again this can be tedious to check. One example is the function stringr::str_sub()
. For instance, we can start with basic examples, such as:
stringr::str_sub(string = "foo", start = 1, end = 2)
[1] "fo"
Easy enough to test. But what happens if string
is missing? Or if start > end
? Or if end
is negative? Or if start
is negative and end
is NULL
and the length of start
is greater than the length of string
? Manually adding tests for all of those is painful and increases the risk of forgetting a corner case.
It is better here to use property-based testing: we don’t need to check the value of the output of functions implemented in tidypolars
, we only need to check that they match the output of functions in tidyverse
.
Here, one additional difficulty is that sometimes throwing an error is the correct behavior. Therefore, we need to create a custom expectation that checks that the output of tidypolars
and tidyverse
is identical, or that both functions error (see the testthat
vignette on creating custom expectations):
expect_equal_or_both_error <- function(object, other) {
polars_error <- FALSE
polars_res <- tryCatch(
object,
error = function(e) polars_error <<- TRUE
)
other_error <- FALSE
other_res <- suppressWarnings(
tryCatch(
other,
error = function(e) other_error <<- TRUE
)
)
if (isTRUE(polars_error)) {
testthat::expect(isTRUE(other_error), "tidypolars errored but tidyverse didn't.")
} else {
testthat::expect_equal(polars_res, other_res)
}
invisible(NULL)
}
Property-based testing will not replace all kinds of tests, and is not necessarily appropriate in all contexts. Still, it can help uncover bugs, segfaults, and it adds more confidence in our code by randomly checking that it works even with implausible inputs.
If you see mistakes or want to suggest changes, please create an issue on the source repository.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/etiennebacher/personal_website_distill, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Bacher (2024, Oct. 1). Etienne Bacher: Using property-based testing in R. Retrieved from https://www.etiennebacher.com/posts/2024-10-01-using-property-testing-in-r/
BibTeX citation
@misc{bacher2024using, author = {Bacher, Etienne}, title = {Etienne Bacher: Using property-based testing in R}, url = {https://www.etiennebacher.com/posts/2024-10-01-using-property-testing-in-r/}, year = {2024} }