duckplyr
duckplyr copied to clipboard
A drop-in replacement for dplyr, powered by DuckDB for performance.
It would be nice to be able to do something like `duckdb::fallback_reporting(TRUE)`. I think that could write a file to `tools::R_user_dir("duckplyr", "cache")`/`DUCKPLYR_FALLBACK_LOG_DIR` that has lower precedence than the environment variables....
I think it would be useful to include a couple of illustrative benchmarks that show the benefits of duckdplyr. Maybe one where there's some optimisation that duckdb makes that substantially...
I think it would be useful to document `mutate()`, `filter()` etc methods so its obvious that duckplyr supports them. This would also be a good place to document any differences...
I'm not sure how to get these files to you, but this causes my R session to hang: ```R paths
I was thinking that `read_csv_lazy()` and `read_parquet_lazy()` might more clearly convey their usage.
I think you should start with `library(duckplyr)` and then tell folks that if they don't want to attach it (i.e. if they're writing a package), they can instead use `as_duckplyr_tibble()`....
e.g. in the readme: ``` out$mean_bill_area #> materializing: #> #> [1] 770.2627 656.8523 694.9360 819.7503 984.2279 ``` I'd suggest something like: ``` out$mean_bill_area #> materializing query: see last_materialization() for details...
I'd say there are three vignettes there: * A duckplyr getting started * Telemetry * Extensibility
Hi all, thanks for such an excellent package! I'm using {duckplyr} as a dependency in my package [{censobr}](https://ipeagit.github.io/censobr/) and it works super nicely but it generates [this strange error in...
The roundtrip of nested columns seems to work, we need operations that can work on them. Let's start with `tidyr::nest()` and `tidyr::unnest()` .