duckplyr icon indicating copy to clipboard operation
duckplyr copied to clipboard

power operation on aggregate result uses dplyr fallback

Open Tmonster opened this issue 4 months ago • 2 comments

There are aggregation functions that are available in DuckDB, but duckplyr still falls back to dplyr.

discovered when benchmarking duckplyr with the db-benchmark. This example comes from group by query q9

repro

.libPaths("./duckplyr/r-duckplyr") # tidyverse/duckplyr#4641
suppressPackageStartupMessages(library("duckplyr", lib.loc="./duckplyr/r-duckplyr", warn.conflicts=FALSE))
ver = packageVersion("duckplyr")

src_grp = "test.csv"

x = as_duckplyr_tibble(data.table::fread(src_grp, showProgress=FALSE, na.strings="", data.table=FALSE))
print(nrow(x))

t = system.time(print(dim(ans<-x %>% summarise(.by = c(id2, id4), r2=cor(v1, v2, use="na.or.complete")^2))))[["elapsed"]]

The duckplyr package is configured to fall back to dplyr when it encounters an incompatibility. Fallback events can be collected and uploaded for analysis to guide future development.
By default, no data will be collected or uploaded.
ℹ A fallback situation just occurred. The following information would have been recorded:
  {"version":"0.4.1","message":"No translation for function
  `^`.","name":"summarise","x":{"...1":"character","...2":"character","...3":"character","...4":"integer","...5":"integer","...6":"integer","...7":"integer","...8":"integer","...9":"numeric"},"args":{"dots":{"...10":"cor(...7,
  ...8, use = \"<character>\")^2"},"by":["...2","...4"]}}
→ Run `duckplyr::fallback_sitrep()` to review the current settings.
→ Run `Sys.setenv(DUCKPLYR_FALLBACK_COLLECT = 1)` to enable fallback logging, and `Sys.setenv(DUCKPLYR_FALLBACK_VERBOSE = TRUE)` in addition to enable printing of fallback situations
  to the console.
→ Run `duckplyr::fallback_review()` to review the available reports, and `duckplyr::fallback_upload()` to upload them.
ℹ See `?duckplyr::fallback()` for details.

test.csv

id1,id2,id3,id4,id5,id6,v1,v2,v3
id010,id007,id0000329755,9,1,707298,1,7,45.741516
id007,id004,id0000136233,5,5,635644,3,1,7.932007
id006,id001,id0000306329,6,4,910916,1,8,92.181312
id007,id009,id0000194009,1,4,378004,3,7,35.369551
id010,id004,id0000067310,5,3,77126,5,5,27.005417
id006,id004,id0000733374,2,6,1416,3,13,3.830562
id007,id010,id0000723276,3,4,567333,5,11,18.993338
id007,id003,id0000191079,5,3,652736,4,1,35.720091
id009,id010,id0000364850,1,5,771296,5,8,90.567817

============================= OLD ISSUE BEFORE EDIT (CAN IGNORE) ====================== repro

library(duckplyr)
library(DBI)
x = as_duckplyr_tibble(iris)
x %>% arrange(sum(Sepal.Length)^2)

The duckplyr package is configured to fall back to dplyr when it encounters an incompatibility. Fallback events can be collected and uploaded for analysis to guide future
development. By default, no data will be collected or uploaded.
ℹ A fallback situation just occurred. The following information would have been recorded:
  {"version":"0.4.1","message":"Can't convert columns of class <factor> to relational. Affected column:
  `...5`.","name":"arrange","x":{"...1":"numeric","...2":"numeric","...3":"numeric","...4":"numeric","...5":"factor"},"args":{"dots":["sum(...1)^2"],".by_group":false}}
→ Run `duckplyr::fallback_sitrep()` to review the current settings.
→ Run `Sys.setenv(DUCKPLYR_FALLBACK_COLLECT = 1)` to enable fallback logging, and `Sys.setenv(DUCKPLYR_FALLBACK_VERBOSE = TRUE)` in addition to enable printing of fallback
  situations to the console.
→ Run `duckplyr::fallback_review()` to review the available reports, and `duckplyr::fallback_upload()` to upload them.
ℹ See `?duckplyr::fallback()` for details.

~~ The occurs on group by query 9 of the db benchmark.

Tmonster avatar Sep 25 '24 15:09 Tmonster