pointblank
pointblank copied to clipboard
Capture all requested steps / checks regardless of if they match columns in current data
Description
This is a follow-on issue to #232. As summarized by @rich-iannone :
(1) the evaluation of expressions in columns should occur during rule creation (2) the re-evaluation of expressions in columns should happen again during interrogation (3) handle cases where no column is available during interrogation (i.e., skips the step but preserves it, unlike now) (4) the expansion of validation steps might need to be reconsidered (e.g., step 1 that targets 3 columns might be resolved as steps 1.1, 1.2, and 1.3 in the reporting)
The motivation for this issue is that pointblank
does not register steps for which the validation check has no relevant columns. However, this means a pipeline can become "brittle" when it is subsequently used (or written to YAML and reused) for a dataset whose columns have changed.
Reproducible example
The following example illustrates the current way pointblank
captures steps:
library(pointblank)
tmpdir <- tempdir()
tmp <- gsub("////", "", tempfile(tmpdir = "", fileext = ".yml"))
create_agent(read_fn = ~data.frame(x_1 = 1, x_2 = 2, y_1 = 3)) %>%
col_vals_gt(starts_with("x"), 10, step_id = 1) %>%
col_vals_lt(starts_with("z"), 2, step_id = 2) %>%
yaml_write(filename = tmp, path = tmpdir)
cat(readLines(file.path(tmpdir, tmp)), sep = "\n")
#> read_fn: ~data.frame(x_1 = 1, x_2 = 2, y_1 = 3)
#> tbl_name: ~
#> label: '[2021-02-09|06:03:29]'
#> locale: en
#> steps:
#> - col_vals_gt:
#> columns: starts_with("x")
#> value: 10.0
Created on 2021-02-09 by the reprex package (v0.3.0)
Expected result
Ideally, pointblank should also register a step for our starts_with('z")
check even though no variables in the specific dataset currently used with the agent start with "z".
Session info
This issue is non session-specific. It was created using package version 0.6.0.9000
End the reproducible example with a call to sessionInfo()
in the same session (e.g. reprex(si = TRUE)
) and include the output.