assertthat icon indicating copy to clipboard operation
assertthat copied to clipboard

mid-pipe assertions

Open r2evans opened this issue 7 years ago • 3 comments

Often when troubleshooting a "long" %>% pipe, if I need to test assertions on the data, I need to interrupt the pipe (if no grouping present) or use a do() block (both with/without grouping).

library(dplyr)
cyls <- 6
mtcars %>%
  filter(cyl == cyls) %>%
  group_by(vs) %>%
  summarize(z = max(density(mpg)$y))

works as one might expect. If you run this with cyls <- 4, though, you'll see that vs=0 only contains one row, and errors out.

cyls <- 4
mtcars %>%
  filter(cyl == cyls) %>%
  group_by(vs) %>%
  mutate(z = max(density(mpg)$y))
# Error in mutate_impl(.data, dots) : 
#   need at least 2 points to select a bandwidth automatically

In order to assert that sufficient data is present, you either need to use a do() block or break up the pipe:

library(assertthat)
cyls <- 4
mtcars %>%
  filter(cyl == cyls) %>%
  group_by(vs) %>%
  do({
    assert_that(
      length(na.omit(.$mpg)) > 1,
      msg = "I cannot grok the data"
    )
    .
  }) %>%
  summarize(z = max(density(mpg)$y))
# Error: I cannot grok the data

It would be nice to be able to test the assertion mid-pipe:

cyls <- 4
mtcars %>%
  filter(cyl == cyls) %>%
  group_by(vs) %>%
  assert_pipe_stop(
    length(na.omit(mpg)) > 1,
    .msg = "I cannot grok the data"
  ) %>%
  summarize(z = max(density(mpg)$y))

Granted, in this contrived example, the error should be sufficient, but it's not hard to consider longer pipelines where calculation should not continue without verified conditions.

I think assertion-companion functions such as assert_pipe_stop and perhaps assert_pipe_warning might be useful. I think it makes more sense to extend assertthat to be pipe-aware vice adding assertions to dplyr or another of the tidyverse packages.

Thoughts?

I'm willing to work on a PR, though admittedly I'm not as proficient at NSE, where these functions would heavily reside.

r2evans avatar Jun 26 '17 22:06 r2evans

I'd find this really useful! I honestly still have trouble using the do workaround as well, so being able to stop a pipe execution if a condition isn't met would be great. I tried using assert_that directly with the %T>% operator from magrittr in the hopes that it would allow the pipe to continue, but wasn't able to get it working. Will poke around some more.

Zedseayou avatar Mar 21 '18 23:03 Zedseayou

Consider using assertr for this functionality: https://cran.r-project.org/web/packages/assertr/vignettes/assertr.html

ArtemSokolov avatar May 26 '18 02:05 ArtemSokolov

Isn't this what the env argument of assert_that() is for?

Below, I assign to global variable d . I then show three pipes with an assert_that() call embedded with the tee operator, and a mutate() on the tibble at the end. One assert_that() should return TRUE; one tests a column of the tibble and should fail; the other tests d and should fail. They seem to work as intended.

> d <- 2 > tibble( a=1, b=2 ) %T>% assert_that( a==1 & d==2, env= . ) %>% mutate( b=3 ) # A tibble: 1 x 2 a b <dbl> <dbl> 1 1 3 > tibble( a=1, b=2 ) %T>% assert_that( a==11 & d==2, env= . ) %>% mutate( b=3 ) Error: a == 11 & d == 2 is not TRUE > tibble( a=1, b=2 ) %T>% assert_that( a==1 & d==22, env= . ) %>% mutate( b=3 ) Error: a == 1 & d == 22 is not TRUE >

PhilvanKleur avatar Jul 15 '20 03:07 PhilvanKleur