validate icon indicating copy to clipboard operation
validate copied to clipboard

Something like mapply() for validation rules

Open matthiasgomolka opened this issue 4 years ago • 1 comments

Hi Mark, I am in the situation that I want to check for many columns if they contain only the values from their respective codelists. From what I know, there is no shortcut fro writing these kinds of rules, since the use of more than one var_group() results in the cartesian product of these groups.

So what I would find helpful is the following:

  1. I have a list of variables and
  2. a list of codelists of the same length.

Within the definition of a validation rule, I would like to use something like (pseudo-code):

mapply(function(var, codelist) {var %in% codelist},
       var = var_group(var_A, var_B, var_C), 
       codelist = list(cl_A, cl_B, cl_C), 
) 

So this should map over both var and codelist and thus create only three validation rules when fed into validator().

To make this even more clear, maybe have a look at how map() is used as a transformation within the {drake} package: https://books.ropensci.org/drake/static.html#map This deviates from the pseudo-code above but might be a better way to actually implement this? (I have no idea)

What are your thoughts on this?

matthiasgomolka avatar Jan 29 '20 08:01 matthiasgomolka

Hi Matthias, I think we should support something for this. One thing you can do is externalize the code lists as follows:

library(validate)

dat <- data.frame(
    x = c("a","a","v","c","b")
  , y = c("321","321","123","231","444")
)

codelists <- list(
    foo = c("a","b","c")
  , bar = c("123","231","312","213","132","321") 
)

rules <- validator(
    x %in% foo
  , y %in% bar
)


out <- confront(dat, rules, ref=codelists)
summary(out)

markvanderloo avatar Feb 20 '20 09:02 markvanderloo