pointblank icon indicating copy to clipboard operation
pointblank copied to clipboard

Capture all requested steps / checks regardless of if they match columns in current data

Open emilyriederer opened this issue 3 years ago • 0 comments

Description

This is a follow-on issue to #232. As summarized by @rich-iannone :

(1) the evaluation of expressions in columns should occur during rule creation (2) the re-evaluation of expressions in columns should happen again during interrogation (3) handle cases where no column is available during interrogation (i.e., skips the step but preserves it, unlike now) (4) the expansion of validation steps might need to be reconsidered (e.g., step 1 that targets 3 columns might be resolved as steps 1.1, 1.2, and 1.3 in the reporting)

The motivation for this issue is that pointblank does not register steps for which the validation check has no relevant columns. However, this means a pipeline can become "brittle" when it is subsequently used (or written to YAML and reused) for a dataset whose columns have changed.

Reproducible example

The following example illustrates the current way pointblank captures steps:

library(pointblank)
tmpdir <- tempdir()
tmp <- gsub("////", "", tempfile(tmpdir = "", fileext = ".yml"))

create_agent(read_fn = ~data.frame(x_1 = 1, x_2 = 2, y_1 = 3)) %>%
  col_vals_gt(starts_with("x"), 10, step_id = 1) %>%
  col_vals_lt(starts_with("z"),  2, step_id = 2) %>%
  yaml_write(filename = tmp, path = tmpdir)

cat(readLines(file.path(tmpdir, tmp)), sep = "\n")
#> read_fn: ~data.frame(x_1 = 1, x_2 = 2, y_1 = 3)
#> tbl_name: ~
#> label: '[2021-02-09|06:03:29]'
#> locale: en
#> steps:
#> - col_vals_gt:
#>     columns: starts_with("x")
#>     value: 10.0

Created on 2021-02-09 by the reprex package (v0.3.0)

Expected result

Ideally, pointblank should also register a step for our starts_with('z") check even though no variables in the specific dataset currently used with the agent start with "z".

Session info

This issue is non session-specific. It was created using package version 0.6.0.9000

End the reproducible example with a call to sessionInfo() in the same session (e.g. reprex(si = TRUE)) and include the output.

emilyriederer avatar Feb 10 '21 01:02 emilyriederer