validate
validate copied to clipboard
extending validation syntax
This was possible in earlier development versions. For now, I think it is a good idea to leave it out and hard-code the common membership/comparison and boolean operators. It will be fairly easy to flexibulate this in version 0.2
Love this package, thanks for all the great work on it.
@markvanderloo are you thinking about working on this enhancement any time soon? It would be extremely useful to be able to define custom syntax somehow and not be restricted to the expressions in validating_call
.
Yes, I think its time we gave this some serious thought now. At the moment you can always work around it in simple cases. For example if you have some function foo()
that returns a logical, you can always use the rule foo(x) == TRUE
. And it is possible to use functions that are from other packages or even self-defined packages. Self-defined functions have to be in the global environment, for example by using source
ing a file that defines the,.
One thing to add to this. I recently had a problem where I wanted to validate every column using the same exact rule to do PII checking. This will explode to potentially thousands of rules and even with the var_group it didn't really make sense for me to type out the column names. I wrote a script to use a special key "___" three underscores in the expressions you want to have a rule for every column for. I also embedded all my rules in csv files so that I could share them with novice validators who could then use our web tool that uses the validate package on the backend (https://wincowger.shinyapps.io/validate/). Simple example below in case this is useful for others or could be included in future versions of the package.
The function requires a rules file and a dataset to validate. PII.csv PII_Rules.csv
library(validate)
library(data.table)
library(dplyr)
rules <- read.csv("PII.csv")
data_formatted <- read.csv("PII_Rules.csv")
do_to_all <- rules %>%
filter(grepl("___", rule))
if(nrow(do_to_all) > 0){
rules <- lapply(colnames(data_formatted), function(new_name){
do_to_all %>%
mutate(rule = gsub("___", new_name, rule)) %>%
mutate(name = paste0(new_name, "_", name))
}) %>%
rbindlist(.) %>%
bind_rows(rules %>% filter(!grepl("___", rule)))
}
rules_formatted <- validator(.data=rules)
report <- confront(data_formatted, rules_formatted)