validate
validate copied to clipboard
Something like mapply() for validation rules
Hi Mark, I am in the situation that I want to check for many columns if they contain only the values from their respective codelists. From what I know, there is no shortcut fro writing these kinds of rules, since the use of more than one var_group()
results in the cartesian product of these groups.
So what I would find helpful is the following:
- I have a list of variables and
- a list of codelists of the same length.
Within the definition of a validation rule, I would like to use something like (pseudo-code):
mapply(function(var, codelist) {var %in% codelist},
var = var_group(var_A, var_B, var_C),
codelist = list(cl_A, cl_B, cl_C),
)
So this should map over both var
and codelist
and thus create only three validation rules when fed into validator()
.
To make this even more clear, maybe have a look at how map()
is used as a transformation within the {drake} package: https://books.ropensci.org/drake/static.html#map
This deviates from the pseudo-code above but might be a better way to actually implement this? (I have no idea)
What are your thoughts on this?
Hi Matthias, I think we should support something for this. One thing you can do is externalize the code lists as follows:
library(validate)
dat <- data.frame(
x = c("a","a","v","c","b")
, y = c("321","321","123","231","444")
)
codelists <- list(
foo = c("a","b","c")
, bar = c("123","231","312","213","132","321")
)
rules <- validator(
x %in% foo
, y %in% bar
)
out <- confront(dat, rules, ref=codelists)
summary(out)