pointblank icon indicating copy to clipboard operation
pointblank copied to clipboard

Feature request: Hide inactive tests from validation report

Open jl5000 opened this issue 1 year ago • 3 comments

In my use case we have a master dataset containing all columns and rows. This data is then used for bespoke downstream analyses, often using subsets of this data. I have a function which creates a validation report for the master dataset, but I would also like to use this same function to pass through subsets of the data and only apply/show the tests that are relevant to the data. This will be viewed by stakeholders, so I'd prefer not to show them lots of greyed out rows.

I thought I would be able to do this by editing the interrogated agent, but it doesn't seem to work. (Incidentally, I was puzzled why the active column is a list column).

I have a similar situation for creating the data dictionary, it would be good if this skipped columns that didn't exist so I could re-use the same code.

library(pointblank)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

x <- iris |> 
  create_agent() |> 
  col_exists("Petal.Length",
             active = has_columns(iris, Petal.Length)) |> 
  col_exists("Spec",
             active = has_columns(iris, Spec)) |> 
  col_exists("Sepal.Length",
             active = has_columns(iris, Sepal.Length)) |> 
  interrogate()

x$validation_set <- filter(x$validation_set, unlist(active))
x
#> Error in if (assertion_type[x] == "serially" && !is.na(agent$validation_set[x, : missing value where TRUE/FALSE needed

Created on 2024-08-12 with reprex v2.1.1

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.4.1 (2024-06-14 ucrt)
#>  os       Windows 10 x64 (build 19045)
#>  system   x86_64, mingw32
#>  ui       RTerm
#>  language (EN)
#>  collate  English_United Kingdom.utf8
#>  ctype    English_United Kingdom.utf8
#>  tz       Europe/London
#>  date     2024-08-12
#>  pandoc   3.1.11 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  blastula      0.3.5   2024-02-24 [1] CRAN (R 4.4.1)
#>  cli           3.6.3   2024-06-21 [1] CRAN (R 4.4.1)
#>  digest        0.6.36  2024-06-23 [1] CRAN (R 4.4.1)
#>  dplyr       * 1.1.4   2023-11-17 [1] CRAN (R 4.4.1)
#>  evaluate      0.24.0  2024-06-10 [1] CRAN (R 4.4.1)
#>  fansi         1.0.6   2023-12-08 [1] CRAN (R 4.4.1)
#>  fastmap       1.2.0   2024-05-15 [1] CRAN (R 4.4.1)
#>  fs            1.6.4   2024-04-25 [1] CRAN (R 4.4.1)
#>  generics      0.1.3   2022-07-05 [1] CRAN (R 4.4.1)
#>  glue          1.7.0   2024-01-09 [1] CRAN (R 4.4.1)
#>  htmltools     0.5.8.1 2024-04-04 [1] CRAN (R 4.4.1)
#>  knitr         1.48    2024-07-07 [1] CRAN (R 4.4.1)
#>  lifecycle     1.0.4   2023-11-07 [1] CRAN (R 4.4.1)
#>  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.4.1)
#>  pillar        1.9.0   2023-03-22 [1] CRAN (R 4.4.1)
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.4.1)
#>  pointblank  * 0.12.1  2024-03-25 [1] CRAN (R 4.4.1)
#>  R6            2.5.1   2021-08-19 [1] CRAN (R 4.4.1)
#>  reprex        2.1.1   2024-07-06 [1] CRAN (R 4.4.1)
#>  rlang         1.1.4   2024-06-04 [1] CRAN (R 4.4.1)
#>  rmarkdown     2.27    2024-05-17 [1] CRAN (R 4.4.1)
#>  rstudioapi    0.16.0  2024-03-24 [1] CRAN (R 4.4.1)
#>  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.4.1)
#>  tibble        3.2.1   2023-03-20 [1] CRAN (R 4.4.1)
#>  tidyselect    1.2.1   2024-03-11 [1] CRAN (R 4.4.1)
#>  utf8          1.2.4   2023-10-22 [1] CRAN (R 4.4.1)
#>  vctrs         0.6.5   2023-12-01 [1] CRAN (R 4.4.1)
#>  withr         3.0.1   2024-07-31 [1] CRAN (R 4.4.1)
#>  xfun          0.46    2024-07-18 [1] CRAN (R 4.4.1)
#>  yaml          2.3.10  2024-07-26 [1] CRAN (R 4.4.1)
#> 
#>  [1] C:/Program Files/R/R-4.4.1/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

jl5000 avatar Aug 12 '24 12:08 jl5000

I think I'd prefer this to be handled via post-processing in {gt} (you should get the "hide inactive rows" behavior for free once we get something like gt::rows_hide() https://github.com/rstudio/gt/issues/975). Unfortunately, not currently possible to do this post-hoc IMO.

In the meantime, the missing hack in your solution is to also align validation_set$i. So given your agent x, this should work:

x$validation_set <- filter(x$validation_set, sapply(active, isTRUE))
x$validation_set$i <- seq_len(nrow(x$validation_set))
x

Note that this isn't public API (!!), though my hope is that this workaround can be made a bit more painless until we get the more principled solution from {gt} (especially w.r.t. needing to update the i column - this surprised me too).


A note for the future - error is triggered here:

https://github.com/rstudio/pointblank/blob/59bb48e7bc8de03dfb96b3f54270f38fa11a3d49/R/get_agent_report.R#L451

yjunechoe avatar Aug 12 '24 13:08 yjunechoe

Separately, to your comment:

Incidentally, I was puzzled why the active column is a list column

This is because active can also hold expressions that evaluate to TRUE/FALSE. For example, if you specify the has_columns() condition as a ~ formula, you get to keep a record of that (and not simply whether they evaluated to TRUE/FALSE):

x <- iris |> 
  create_agent() |> 
  col_exists("Petal.Length",
             active = ~ . %>% has_columns(Petal.Length)) |> 
  col_exists("Spec",
             active = ~ . %>% has_columns(Spec)) |> 
  col_exists("Sepal.Length",
             active = ~ . %>% has_columns(Sepal.Length)) |> 
  interrogate()
  
x$validation_set$active
#> [[1]]
#> ~. %>% has_columns(Petal.Length)
#> 
#> [[2]]
#> ~. %>% has_columns(Spec)
#> 
#> [[3]]
#> ~. %>% has_columns(Sepal.Length)

So actually, while active works for your specific example, you should instead read eval_active which is the logical vector column you're looking for:

x$validation_set$eval_active
#> [1]  TRUE FALSE  TRUE

yjunechoe avatar Aug 12 '24 13:08 yjunechoe

Many thanks! I shall take your advice for the interim workarounds and wait for the {gt} functionality :)

I'm happy for you to close this issue if you would like.

jl5000 avatar Aug 12 '24 14:08 jl5000

@jl5000 The ability to hide rows by filtering on $validation_set still remains not part of public API, but your mental model of equivalence between rows of $validation_set and the rows of the agent report is accurate.

On dev, the intuitive behavior from your reprex should now work!

yjunechoe avatar Sep 10 '24 14:09 yjunechoe