Tom Ebergen
Tom Ebergen
If you have a wide build table and skinny probe table, it's possible it's cheaper to build on the skinny probe. I have a micro benchmark to test this. There...
Fixes https://github.com/duckdblabs/duckdb-internal/issues/2434
closes https://github.com/duckdb/duckdb/issues/11042 The optimization to convert mark joins to semi joins should not always happen. If a mark join index is in a projection operator from before the mark join,...
For some reason writing the ipc format for the queries causes the join benchmark 5GB+ to hang. Removing this should get the results back https://github.com/duckdblabs/db-benchmark/blob/269a9a77b950b36c7c812c57889aa41f23c0b98a/polars/join-polars.py#L56 the lines in question
Also helps to verify a fix for https://github.com/duckdblabs/duckdb-internal/issues/2723 Before: ``` ┌─────────────┴─────────────┐ │ COMPARISON_JOIN │ │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │ │ INNER ├──────────────┐...
Currently this logic is sprinkled within filter pull up and filter push down. The problem is it's not at every operator, and adding empty results across multiple files within the...
There are aggregation functions that are available in DuckDB, but duckplyr still falls back to dplyr. discovered when benchmarking duckplyr with the db-benchmark. This example comes from group by query...
This PR is a WIP PR to add duckplyr. Things that need to be worked out, group by queries: Q8: Query = `ans% select(id6, largest2_v3=v3) %>% filter(!is.na(largest2_v3)) %>% arrange(desc(largest2_v3)) %>%...
fixes https://github.com/duckdblabs/duckdb-internal/issues/3268