pointblank
pointblank copied to clipboard
Using arrow results in error "not really a table object"
Prework
- [x] Read and agree to the code of conduct and contributing guidelines.
- [x] If there is already a relevant issue, whether open or closed, comment on the existing thread instead of posting a new issue.
- [x] Post a minimal reproducible example so the maintainer can troubleshoot the problems you identify. A reproducible example is:
- [x] Runnable: post enough R code and data so any onlooker can create the error on their own computer.
- [x] Minimal: reduce runtime wherever possible and remove complicated details that are irrelevant to the issue at hand.
- [x] Readable: format your code according to the tidyverse style guide.
Description
When interrogating an agent that is an arrow object, I get the following error: The 'table' in this validation step is not really a table object.
When I convert the arrow dataset to a data.frame first, pointblank works as expected
create_agent(as.data.frame(df)) |> # NOTE the as.data.frame here
col_is_numeric(vars(x)) |>
interrogate()
#> ── Interrogation Started - there is a single validation step ────────────────────────────────────────────────
#> ✔ Step 1: OK.
#> ── Interrogation Completed ──────────────────────────────────────────────────────────────────────────────────
Reproducible example
library(pointblank)
library(arrow)
#>
#> Attaching package: 'arrow'
#> The following object is masked from 'package:utils':
#>
#> timestamp
df <- arrow_table(x = 1:3, y = c("a", "b", "c"))
agent <- create_agent(df) |>
col_is_numeric(vars(x))
agent |> get_agent_report(display_table = FALSE)
#> # A tibble: 1 × 14
#> i type columns values precon active eval units n_pass f_pass W S
#> <int> <chr> <chr> <chr> <chr> <lgl> <chr> <int> <int> <dbl> <lgl> <lgl>
#> 1 1 col_… x <NA> <NA> NA <NA> NA NA NA NA NA
#> # … with 2 more variables: N <lgl>, extract <lgl>
agent |> interrogate() |> get_agent_report(display_table = FALSE)
#> # A tibble: 1 × 14
#> i type columns values precon active eval units n_pass f_pass W S
#> <int> <chr> <chr> <chr> <chr> <lgl> <chr> <int> <int> <dbl> <lgl> <lgl>
#> 1 1 col_… x <NA> <NA> TRUE ERROR NA NA NA NA NA
#> # … with 2 more variables: N <lgl>, extract <int>
# repeat with a database connection --------------------
write_dataset(df, "arrow-dataset")
ds <- open_dataset("arrow-dataset")
agent <- create_agent(ds) |>
col_is_numeric(vars(x))
agent |> get_agent_report(display_table = FALSE)
#> # A tibble: 1 × 14
#> i type columns values precon active eval units n_pass f_pass W S
#> <int> <chr> <chr> <chr> <chr> <lgl> <chr> <int> <int> <dbl> <lgl> <lgl>
#> 1 1 col_… x <NA> <NA> NA <NA> NA NA NA NA NA
#> # … with 2 more variables: N <lgl>, extract <lgl>
agent |> interrogate() |> get_agent_report(display_table = FALSE)
#> # A tibble: 1 × 14
#> i type columns values precon active eval units n_pass f_pass W S
#> <int> <chr> <chr> <chr> <chr> <lgl> <chr> <int> <int> <dbl> <lgl> <lgl>
#> 1 1 col_… x <NA> <NA> TRUE ERROR NA NA NA NA NA
#> # … with 2 more variables: N <lgl>, extract <int>
Created on 2023-04-17 with reprex v2.0.2
Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.2.1 (2022-06-23)
#> os Ubuntu 18.04.6 LTS
#> system x86_64, linux-gnu
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz Etc/UTC
#> date 2023-04-17
#> pandoc 2.18 @ /usr/lib/rstudio-server/bin/quarto/bin/tools/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> arrow * 11.0.0.3 2023-03-08 [1] RSPM (R 4.2.1)
#> assertthat 0.2.1 2019-03-21 [1] RSPM (R 4.2.1)
#> bit 4.0.4 2020-08-04 [1] RSPM (R 4.2.1)
#> bit64 4.0.5 2020-08-30 [1] RSPM (R 4.2.1)
#> blastula 0.3.3 2023-01-07 [1] RSPM (R 4.2.1)
#> cli 3.6.0 2023-01-09 [1] RSPM (R 4.2.1)
#> digest 0.6.31 2022-12-11 [1] RSPM (R 4.2.1)
#> dplyr 1.1.1 2023-03-22 [1] RSPM (R 4.2.1)
#> evaluate 0.16 2022-08-09 [1] RSPM (R 4.2.1)
#> fansi 1.0.3 2022-03-24 [1] RSPM (R 4.2.1)
#> fastmap 1.1.1 2023-02-24 [1] RSPM (R 4.2.1)
#> fs 1.6.1 2023-02-06 [1] RSPM (R 4.2.1)
#> generics 0.1.3 2022-07-05 [1] RSPM (R 4.2.1)
#> glue 1.6.2 2022-02-24 [1] RSPM (R 4.2.1)
#> htmltools 0.5.4 2022-12-07 [1] RSPM (R 4.2.1)
#> knitr 1.42 2023-01-25 [1] RSPM (R 4.2.1)
#> lifecycle 1.0.3 2022-10-07 [1] RSPM (R 4.2.1)
#> magrittr 2.0.3 2022-03-30 [1] RSPM (R 4.2.1)
#> pillar 1.8.1 2022-08-19 [1] RSPM (R 4.2.1)
#> pkgconfig 2.0.3 2019-09-22 [1] RSPM (R 4.2.1)
#> pointblank * 0.11.3 2023-02-09 [1] RSPM (R 4.2.1)
#> purrr 1.0.1 2023-01-10 [1] RSPM (R 4.2.1)
#> R6 2.5.1 2021-08-19 [1] RSPM (R 4.2.1)
#> reprex 2.0.2 2022-08-17 [2] RSPM (R 4.2.1)
#> rlang 1.1.0 2023-03-14 [1] RSPM (R 4.2.1)
#> rmarkdown 2.16 2022-08-24 [1] RSPM (R 4.2.1)
#> rstudioapi 0.14 2022-08-22 [2] RSPM (R 4.2.1)
#> sessioninfo 1.2.2 2021-12-06 [1] RSPM (R 4.2.1)
#> tibble 3.2.1 2023-03-20 [1] RSPM (R 4.2.1)
#> tidyselect 1.2.0 2022-10-10 [1] RSPM (R 4.2.1)
#> utf8 1.2.2 2021-07-24 [1] RSPM (R 4.2.1)
#> vctrs 0.6.1 2023-03-22 [1] RSPM (R 4.2.1)
#> withr 2.5.0 2022-03-03 [2] RSPM (R 4.2.1)
#> xfun 0.38 2023-03-24 [1] RSPM (R 4.2.1)
#> yaml 2.3.7 2023-01-23 [1] RSPM (R 4.2.1)
#>
#> [1] /home/NAME/R/x86_64-pc-linux-gnu-library/4.2
#> [2] /usr/r-library/admin-library/4.2
#> [3] /opt/R/4.2.1/lib/R/library
#>
#> ──────────────────────────────────────────────────────────────────────────────
Thanks for reporting this and providing a lot of details! This is definitely not right and requires a fix.
FWIW I see this more of a feature request than a bug. Think of arrow as another backend. I think the error message is informative (while not perfect). An arrow
dataset is neither a data.frame, nor a database table. So I would expect the current approach not to work. I couldn't find in the {pointblank} documentation a claim that the tbl
argument of create_agent()
can be an arrow::Table
.
As a first suggestion it would be great to have the documentation of the supported backends in a more prominent location (e.g. a paragraph in the {pagedown} site).
A second suggestion: maybe, in a first instance, error with a clear message that arrow tables (or datasets, etc.) are not (yet) supported and have a follow-up issue to implement such support arrow inputs? (I have done some work on {arrow} in the past and I think this might not be a trivial endeavour).
(by the way, thanks a lot for the great package and for the R in Pharma workshop)