Cory Grinstead
Cory Grinstead
I haven't yet been able to identify a single bottleneck, but it seems like there are at least a few culprits. - copying/moving data during `concat` _(I think this is...
couldn't we evaluate the aggregate, then repeat the value over the other column's length? ```py from daft import col import daft df = daft.from_pydict({"a": [1, 2, 3], "b": [4, 5,...
Some benchmarks using tpch scale 5 of "customer" table Included `polars` to give a point of reference. ```py # polars with projection pl.scan_ndjson('./customer.json').select("c_mktsegment").collect() # daft with projection daft.read_json('./customer.json').select("c_mktsegment").collect() # polars...
> we noticed the same issue for our parquet reader and [added a "local" path](https://github.com/Eventual-Inc/Daft/blob/a0fd6ecaeb1b592fcb0e9cb5b94f3d56d7b73c68/src/daft-parquet/src/read.rs#L92) that checks if the file is local and then uses a parquet reader that is...
>So the optimizer, as I'm sure you know, doesn't pass sort information to table providers/data sources, so while I'm not wed to this particular approach, you're proposing something that isn't...
marking as draft as it's not actively waiting on review
marking as draft as it's not actively waiting on review. @melbourne2991 please feel free to ping us when it is ready.
marking this as draft as I don't think this is actively waiting on review
upstream issue: https://github.com/sqlparser-rs/sqlparser-rs/issues/892
> So if the desired outcome of the issue is "change the language we use to talk about this feature," then I think we can make this change. I think...