duckplyr
duckplyr copied to clipboard
power operation on aggregate result uses dplyr fallback
There are aggregation functions that are available in DuckDB, but duckplyr still falls back to dplyr.
discovered when benchmarking duckplyr with the db-benchmark. This example comes from group by query q9
repro
.libPaths("./duckplyr/r-duckplyr") # tidyverse/duckplyr#4641
suppressPackageStartupMessages(library("duckplyr", lib.loc="./duckplyr/r-duckplyr", warn.conflicts=FALSE))
ver = packageVersion("duckplyr")
src_grp = "test.csv"
x = as_duckplyr_tibble(data.table::fread(src_grp, showProgress=FALSE, na.strings="", data.table=FALSE))
print(nrow(x))
t = system.time(print(dim(ans<-x %>% summarise(.by = c(id2, id4), r2=cor(v1, v2, use="na.or.complete")^2))))[["elapsed"]]
The duckplyr package is configured to fall back to dplyr when it encounters an incompatibility. Fallback events can be collected and uploaded for analysis to guide future development.
By default, no data will be collected or uploaded.
ℹ A fallback situation just occurred. The following information would have been recorded:
{"version":"0.4.1","message":"No translation for function
`^`.","name":"summarise","x":{"...1":"character","...2":"character","...3":"character","...4":"integer","...5":"integer","...6":"integer","...7":"integer","...8":"integer","...9":"numeric"},"args":{"dots":{"...10":"cor(...7,
...8, use = \"<character>\")^2"},"by":["...2","...4"]}}
→ Run `duckplyr::fallback_sitrep()` to review the current settings.
→ Run `Sys.setenv(DUCKPLYR_FALLBACK_COLLECT = 1)` to enable fallback logging, and `Sys.setenv(DUCKPLYR_FALLBACK_VERBOSE = TRUE)` in addition to enable printing of fallback situations
to the console.
→ Run `duckplyr::fallback_review()` to review the available reports, and `duckplyr::fallback_upload()` to upload them.
ℹ See `?duckplyr::fallback()` for details.
test.csv
id1,id2,id3,id4,id5,id6,v1,v2,v3
id010,id007,id0000329755,9,1,707298,1,7,45.741516
id007,id004,id0000136233,5,5,635644,3,1,7.932007
id006,id001,id0000306329,6,4,910916,1,8,92.181312
id007,id009,id0000194009,1,4,378004,3,7,35.369551
id010,id004,id0000067310,5,3,77126,5,5,27.005417
id006,id004,id0000733374,2,6,1416,3,13,3.830562
id007,id010,id0000723276,3,4,567333,5,11,18.993338
id007,id003,id0000191079,5,3,652736,4,1,35.720091
id009,id010,id0000364850,1,5,771296,5,8,90.567817
============================= OLD ISSUE BEFORE EDIT (CAN IGNORE) ====================== repro
library(duckplyr)
library(DBI)
x = as_duckplyr_tibble(iris)
x %>% arrange(sum(Sepal.Length)^2)
The duckplyr package is configured to fall back to dplyr when it encounters an incompatibility. Fallback events can be collected and uploaded for analysis to guide future
development. By default, no data will be collected or uploaded.
ℹ A fallback situation just occurred. The following information would have been recorded:
{"version":"0.4.1","message":"Can't convert columns of class <factor> to relational. Affected column:
`...5`.","name":"arrange","x":{"...1":"numeric","...2":"numeric","...3":"numeric","...4":"numeric","...5":"factor"},"args":{"dots":["sum(...1)^2"],".by_group":false}}
→ Run `duckplyr::fallback_sitrep()` to review the current settings.
→ Run `Sys.setenv(DUCKPLYR_FALLBACK_COLLECT = 1)` to enable fallback logging, and `Sys.setenv(DUCKPLYR_FALLBACK_VERBOSE = TRUE)` in addition to enable printing of fallback
situations to the console.
→ Run `duckplyr::fallback_review()` to review the available reports, and `duckplyr::fallback_upload()` to upload them.
ℹ See `?duckplyr::fallback()` for details.
~~ The occurs on group by query 9 of the db benchmark.