assertthat
assertthat copied to clipboard
mid-pipe assertions
Often when troubleshooting a "long" %>%
pipe, if I need to test assertions on the data, I need to interrupt the pipe (if no grouping present) or use a do()
block (both with/without grouping).
library(dplyr)
cyls <- 6
mtcars %>%
filter(cyl == cyls) %>%
group_by(vs) %>%
summarize(z = max(density(mpg)$y))
works as one might expect. If you run this with cyls <- 4
, though, you'll see that vs=0
only contains one row, and errors out.
cyls <- 4
mtcars %>%
filter(cyl == cyls) %>%
group_by(vs) %>%
mutate(z = max(density(mpg)$y))
# Error in mutate_impl(.data, dots) :
# need at least 2 points to select a bandwidth automatically
In order to assert that sufficient data is present, you either need to use a do()
block or break up the pipe:
library(assertthat)
cyls <- 4
mtcars %>%
filter(cyl == cyls) %>%
group_by(vs) %>%
do({
assert_that(
length(na.omit(.$mpg)) > 1,
msg = "I cannot grok the data"
)
.
}) %>%
summarize(z = max(density(mpg)$y))
# Error: I cannot grok the data
It would be nice to be able to test the assertion mid-pipe:
cyls <- 4
mtcars %>%
filter(cyl == cyls) %>%
group_by(vs) %>%
assert_pipe_stop(
length(na.omit(mpg)) > 1,
.msg = "I cannot grok the data"
) %>%
summarize(z = max(density(mpg)$y))
Granted, in this contrived example, the error should be sufficient, but it's not hard to consider longer pipelines where calculation should not continue without verified conditions.
I think assertion-companion functions such as assert_pipe_stop
and perhaps assert_pipe_warning
might be useful. I think it makes more sense to extend assertthat
to be pipe-aware vice adding assertions to dplyr
or another of the tidyverse
packages.
Thoughts?
I'm willing to work on a PR, though admittedly I'm not as proficient at NSE, where these functions would heavily reside.
I'd find this really useful! I honestly still have trouble using the do
workaround as well, so being able to stop a pipe execution if a condition isn't met would be great. I tried using assert_that
directly with the %T>%
operator from magrittr
in the hopes that it would allow the pipe to continue, but wasn't able to get it working. Will poke around some more.
Consider using assertr for this functionality: https://cran.r-project.org/web/packages/assertr/vignettes/assertr.html
Isn't this what the env argument of assert_that() is for?
Below, I assign to global variable d . I then show three pipes with an assert_that() call embedded with the tee operator, and a mutate() on the tibble at the end. One assert_that() should return TRUE; one tests a column of the tibble and should fail; the other tests d and should fail. They seem to work as intended.
> d <- 2
> tibble( a=1, b=2 ) %T>% assert_that( a==1 & d==2, env= . ) %>% mutate( b=3 )
# A tibble: 1 x 2
a b
<dbl> <dbl>
1 1 3
> tibble( a=1, b=2 ) %T>% assert_that( a==11 & d==2, env= . ) %>% mutate( b=3 )
Error: a == 11 & d == 2 is not TRUE
> tibble( a=1, b=2 ) %T>% assert_that( a==1 & d==22, env= . ) %>% mutate( b=3 )
Error: a == 1 & d == 22 is not TRUE
>