validate icon indicating copy to clipboard operation
validate copied to clipboard

`check_that` throws cryptic errors when run after a pipeline step fails

Open mrkaye97 opened this issue 2 years ago • 5 comments

Sorry in advance if this has already been asked -- I haven't seen anything about it. Pasting a reprex that does a better job explaining what's going on than I can:

suppressPackageStartupMessages({
  library(dplyr)
  library(validate)
})

iris %>%
  filter(
    foobar > 2
  )
#> Error: Problem with `filter()` input `..1`.
#> ℹ Input `..1` is `foobar > 2`.
#> x object 'foobar' not found

iris %>%
  filter(
    foo > 3
  ) %>%
  mutate(
    bar = Sepal.Length + 1
  )
#> Error: Problem with `filter()` input `..1`.
#> ℹ Input `..1` is `foo > 3`.
#> x object 'foo' not found

## problem:
iris %>%
  filter(
    foo > 3
  ) %>%
  check_that(
    Sepal.Length < 1000
  )
#> Error in (function (cond) : error in evaluating the argument 'dat' in selecting a method for function 'confront': Problem with `filter()` input `..1`.
#> ℹ Input `..1` is `foo > 3`.
#> x object 'foo' not found

## big problem:
iris %>%
  filter(
    foo > 3
  ) %>%
  check_that(
    Sepal.Length < 1000
  ) %>%
  filter(
    foo > 3
  ) %>%
  check_that(
    Sepal.Length < 1000
  ) %>%
  filter(
    foo > 3
  ) %>%
  check_that(
    Sepal.Length < 1000
  ) %>%
  filter(
    foo > 3
  ) %>%
  check_that(
    Sepal.Length < 1000
  ) %>%
  filter(
    foo > 3
  ) %>%
  check_that(
    Sepal.Length < 1000
  )
#> Error in h(simpleError(msg, call)): error in evaluating the argument 'dat' in selecting a method for function 'confront': error in evaluating the argument 'dat' in selecting a method for function 'confront': error in evaluating the argument 'dat' in selecting a method for function 'confront': error in evaluating the argument 'dat' in selecting a method for function 'confront': error in evaluating the argument 'dat' in selecting a method for function 'confront': Problem with `filter()` input `..1`.
#> ℹ Input `..1` is `foo > 3`.
#> x object 'foo' not found

Created on 2021-08-13 by the reprex package (v2.0.0)

Basically, filter fails and then check_that seems to not know what to do, so it spits out a bunch of junk before printing an actual error. This is a bigger issue when you chain 100 steps together though, because then it spits out too much junk to actually print / parse, so it's hard to figure out what's actually going wrong.

Is this a known issue / conscious choice? And if it is, what's the best way to handle this behavior?

Thanks!

mrkaye97 avatar Aug 13 '21 17:08 mrkaye97

The first error msg I see comes from filter. This is not a validate function. I'd go after that first. Maybe use dplyr::filter.and similar for the other functions? (I'm not near a computer now so I can't test)

markvanderloo avatar Aug 13 '21 21:08 markvanderloo

@markvanderloo Oh yeah, I know the error is coming from dplyr::filter(). The point here was that the checks run fine when filter() works (and it does, I just told it to filter by foo which doesn't exist), but when filter() bombs, it seems like I just get this long, unhelpful error message. Here's a reprex to show filter working:

suppressPackageStartupMessages({
  library(dplyr)
  library(validate)
})

iris <- tibble::as_tibble(iris)

## Filtering works fine
iris %>%
  filter(
    Sepal.Length > 5
  )
#> # A tibble: 118 x 5
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#>  1          5.1         3.5          1.4         0.2 setosa 
#>  2          5.4         3.9          1.7         0.4 setosa 
#>  3          5.4         3.7          1.5         0.2 setosa 
#>  4          5.8         4            1.2         0.2 setosa 
#>  5          5.7         4.4          1.5         0.4 setosa 
#>  6          5.4         3.9          1.3         0.4 setosa 
#>  7          5.1         3.5          1.4         0.3 setosa 
#>  8          5.7         3.8          1.7         0.3 setosa 
#>  9          5.1         3.8          1.5         0.3 setosa 
#> 10          5.4         3.4          1.7         0.2 setosa 
#> # … with 108 more rows

## Filtering and then piping into check_that also works fine
iris %>%
  filter(
    Sepal.Length > 5
  ) %>%
  check_that(
    Sepal.Length > 5
  )
#> Object of class 'validation'
#> Call:
#>     check_that(., Sepal.Length > 5)
#> 
#> Rules confronted: 1
#>    With fails   : 0
#>    With missings: 0
#>    Threw warning: 0
#>    Threw error  : 0


## problem: When `filter` fails, `check_that` throws a gibberish error
iris %>%
  filter(
    foo > 3
  ) %>%
  check_that(
    Sepal.Length < 1000
  )
#> Error in (function (cond) : error in evaluating the argument 'dat' in selecting a method for function 'confront': Problem with `filter()` input `..1`.
#> ℹ Input `..1` is `foo > 3`.
#> x object 'foo' not found

## big problem: Multiple `check_that`s means multiple repetitions of this error
## arbitrarily many of them, as the chain gets bigger
iris %>%
  filter(
    foo > 3
  ) %>%
  check_that(
    Sepal.Length < 1000
  ) %>%
  filter(
    foo > 3
  ) %>%
  check_that(
    Sepal.Length < 1000
  ) %>%
  filter(
    foo > 3
  ) %>%
  check_that(
    Sepal.Length < 1000
  ) %>%
  filter(
    foo > 3
  ) %>%
  check_that(
    Sepal.Length < 1000
  ) %>%
  filter(
    foo > 3
  ) %>%
  check_that(
    Sepal.Length < 1000
  )
#> Error in h(simpleError(msg, call)): error in evaluating the argument 'dat' in selecting a method for function 'confront': error in evaluating the argument 'dat' in selecting a method for function 'confront': error in evaluating the argument 'dat' in selecting a method for function 'confront': error in evaluating the argument 'dat' in selecting a method for function 'confront': error in evaluating the argument 'dat' in selecting a method for function 'confront': Problem with `filter()` input `..1`.
#> ℹ Input `..1` is `foo > 3`.
#> x object 'foo' not found

## When filter fails and pipes into mutate, however, we still
##  get the same informative error we'd expect
iris %>%
  filter(
    foo > 3
  ) %>%
  mutate(
    bar = Sepal.Length + 1
  )
#> Error: Problem with `filter()` input `..1`.
#> ℹ Input `..1` is `foo > 3`.
#> x object 'foo' not found

Created on 2021-08-13 by the reprex package (v2.0.0)

I'm not an expert of the codebase, but my suspicion from this error is that there's an S3 method for confront that's getting NULL or something similar, and is trying to call confront.null and is getting confused. Again, not an expert but that's what seems like might be happening.

mrkaye97 avatar Aug 13 '21 21:08 mrkaye97

Also, FWIW, my mental model for what should happen here is that check_that shouldn't run if filter fails (just like mutate doesn't, or at least doesn't seem to). It seems to me like that's the real issue

mrkaye97 avatar Aug 13 '21 21:08 mrkaye97

Ok, so I now understand your question better. The error:

#> Error: Problem with `filter()` input `..1`.
#> ℹ Input `..1` is `foo > 3`.
#> x object 'foo' not found

is not thrown by check_that(). One clue is that it uses colorized output, which we use nowhere in validate. So it must ultimately come from dplyr or magrittr. If I use the R pipe, I get the same message so it must be dplyr or one of its dependencies.

> iris |> filter(foo>3) |> check_that(Sepal.Length>0)
Error in (function (cond)  : 
  error in evaluating the argument 'dat' in selecting a method for function 'confront': Problem with `filter()` input `..1`.
ℹ Input `..1` is `foo > 3`.
✖ object 'foo' not found

markvanderloo avatar Aug 15 '21 16:08 markvanderloo

Thanks @markvanderloo. I think we're still getting our wires crossed. I know the error

#> Error: Problem with `filter()` input `..1`.
#> ℹ Input `..1` is `foo > 3`.
#> x object 'foo' not found

is coming from dplyr, but that isn't the issue I'm talking about. What I'm talking about is this piece of the error:

Error in (function (cond)  : 
  error in evaluating the argument 'dat' in selecting a method for function 'confront'

which is clearly coming from validate, not dplyr or magrittr. I think the fact that you get the same behavior with the base R pipe is good evidence of that, but here's a reprex using no dplyr that shows the same issue:

library(validate)
library(magrittr)

## Broken, but no dplyr
iris %>%
    subset(
        foo > 2
    ) %>%
    check_that(
        Sepal.Length < 100
    )
#> Error in h(simpleError(msg, call)): error in evaluating the argument 'dat' in selecting a method for function 'confront': object 'foo' not found

## Also broken, but no dplyr and no pipe
check_that(
    subset(
        iris,
        foo > 2
    ),
    Sepal.Length < 100
)
#> Error in h(simpleError(msg, call)): error in evaluating the argument 'dat' in selecting a method for function 'confront': object 'foo' not found

## Works fine
check_that(
    subset(
        iris,
        Sepal.Length < 6
    ),
    Sepal.Length < 100
)
#> Object of class 'validation'
#> Call:
#>     check_that(subset(iris, Sepal.Length < 6), Sepal.Length < 100)
#> 
#> Rules confronted: 1
#>    With fails   : 0
#>    With missings: 0
#>    Threw warning: 0
#>    Threw error  : 0

Created on 2021-08-15 by the reprex package (v2.0.1)

What I'm saying is that I think that validate::check_that doesn't know what to do when something that gets passed into its dat argument throws an error, which seems to me to be a bug. Let me know if I can be more helpful than this or if it's still not clear what I'm getting at. I think this has nothing to do with dplyr or magrittr though. It really seems to me like it might be an S3-related error in validate or confront, but I'm not 100% sure.

To reiterate from before: My mental model for what should happen in check_that when dat throws an error (like here) is that check_that should throw the same error without adding this other simpleError stuff:

Error in (function (cond)  : 
  error in evaluating the argument 'dat' in selecting a method for function 'confront'

I would expect check_that to error out without throwing that error message, and just do what dplyr does and throw the first error. Here's a base R reprex for the behavior nesting like this gives in subset when there's an error in the inner function:

subset(
    subset(
        iris,
        foo > 2
    ),
    Sepal.Length > 3
)
#> Error in eval(e, x, parent.frame()): object 'foo' not found

Created on 2021-08-15 by the reprex package (v2.0.1)

That's the kind of behavior I'd expect from check_that, too.

Let me know if this doesn't clear up what I'm thinking

mrkaye97 avatar Aug 15 '21 16:08 mrkaye97