`get_data_extracts()` gets less data for `rows_distinct()` than for `col_vals_*()`
Description
get_data_extracts() behaves differently for validation functions of the form col_vals_*(), conjointly() and rows_distinct() For rows_distinct, the tibble contains only the tested columns, contrary to the other functions.
Reproducible example
library(pointblank)
library(dplyr)
tbl <- tibble(id1=1:5,
id2=c("A", "b", "C", "D", "E"),
a = c(8, 8, 8, 5, 9),
b = c(11,11:14),
date = as.Date(paste0("2023-01-0",1:5)))
# The columns or set of columns that need to be displayed
# (to help identify the row) along the column with invalid value
id_columns <- c("id1", "id2")
agent <-
create_agent(
tbl = tbl,
tbl_name = "small_table",
label = "An example."
) %>%
col_vals_gt(columns = vars(a), value = 6) %>%
col_vals_gt(columns = vars(b), value = 11) %>%
col_vals_regex(columns = vars(id2), regex = "[A-Z]") %>%
rows_distinct(columns = vars(a)) %>%
rows_distinct(columns = c("b")) %>%
rows_distinct(columns = c("a", "b")) %>%
conjointly(
~ col_vals_lt(., columns = vars(a), value = 7),
~ col_vals_gt(., columns = vars(a), value = vars(b))) %>%
col_is_date(columns = "date") %>%
interrogate()
agent
agent %>% get_agent_report(display_table = FALSE)
# Loop over each step and display a selection of columns from failing rows
for (c_step in 1:nrow(get_agent_report(agent, display_table = F))){
print("====================")
get_agent_x_list(agent, i = c_step)$briefs %>% print
print(c("current step: ", c_step))
get_agent_x_list(agent, i = c_step)$columns %>% print
columns_to_display <- unique c((id_columns, get_agent_x_list(agent, i = c_step)$columns ))
get_data_extracts(agent, i=c_step) %>%
select(columns_to_display) %>% # comment this line out to see the result
print
}
Expected result
For the col_vals_*() and conjointly() function, get_data_extracts() returns all columns, which allows further selection of the columns one wishes to keep for display.
However, for rows_distinct(columns=vars(a)), only the 'a' column remains. I do not know of a way to get the full rows for the failing rows with pointblank.
Using agent %>% get_agent_report(display_table = TRUE), the same issue holds for the "CSV" buttons.
In the example, we want two columns, id1 and id2, to be displayed (to help identify the failing row) along with the column with invalid values.
Commenting out the following line in the code above helps see the difference in behaviour:
select(columns_to_display) %>%
Session info
sessionInfo() R version 4.2.2 Patched (2022-11-10 r83330) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 23.04
Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.11.0 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.11.0
locale:
[1] LC_CTYPE=fr_FR.UTF-8 LC_NUMERIC=C LC_TIME=fr_FR.UTF-8
[4] LC_COLLATE=fr_FR.UTF-8 LC_MONETARY=fr_FR.UTF-8 LC_MESSAGES=fr_FR.UTF-8
[7] LC_PAPER=fr_FR.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] dplyr_1.1.2 pointblank_0.11.4
loaded via a namespace (and not attached):
[1] rstudioapi_0.14 xml2_1.3.3 magrittr_2.0.3 tidyselect_1.2.0 gt_0.9.0 R6_2.5.1
[7] rlang_1.1.0 fastmap_1.1.1 fansi_1.0.4 tools_4.2.2 xfun_0.39 utf8_1.2.3
[13] blastula_0.3.3 cli_3.6.1 withr_2.5.0 commonmark_1.9.0 htmltools_0.5.5 digest_0.6.31
[19] tibble_3.2.1 lifecycle_1.0.3 crayon_1.5.2 sass_0.4.5 base64enc_0.1-3 vctrs_0.6.2
[25] glue_1.6.2 compiler_4.2.2 pillar_1.9.0 generics_0.1.3 markdown_1.6 pkgconfig_2.0.3
I edited my example, which missed c(...) in columns_to_display <- unique (c(id_columns, get_agent_x_list(agent, i = c_step)$columns ))