Jay Chia comments

Results 126 comments of


                                            Jay Chia

[FEAT] eval expressions for List Types

Closing this in favor of #2913

Global Expressions: improved Aggregation syntax

cc @@skrawcz as well

[FEAT] Allow for selection of append/overwrite/overwrite_partitions options when writing data

We can tackle append + overwrite first, and make a separate ticket for overwrite_partitions

Support partitioned writes

> Idea from @Fokko - support day/month/year transforms first You can also try using the transforms that Daft has already implemented. Full list of transforms: * [Expression.partitioning.days]( https://www.getdaft.io/projects/docs/en/latest/api_docs/doc_gen/expression_methods/daft.Expression.partitioning.days.html) * [Expression.partitioning.hours](...

Generated Parquet files are extremely fragmented

I also did some very rough benchmarks before/after making the rowgroups nicer: DuckDB: ``` Before Code block 'Run duckdb query 1' took: 5.84075 s Code block 'Run duckdb query 2'...

Generated Parquet files are extremely fragmented

I tried doing `.collect().write_parquet()` on a `SCALE_FACTOR=0.2` dataset - it seems to be better but the rowgroups are still fairly fragmented (about 4MB compressed, 10MB uncompressed) but also noticing some...

Generated Parquet files are extremely fragmented

> Though that recommendation doesn't mean better performance I think. 512MB is very large and we could a lot more in parallel if we shrink the sizes. > > Currently,...

Generated Parquet files are extremely fragmented

> I think I will try to hit a row count rather than a row-group size (defaulting to 512^2). Currently there was an issue in Polars that allowed very small...

Using Daft for offline featurization

(1) sounds like the most compelling integration point! Happy to explore integrations there that might make sense.

Support extremely flexible list datatype declarations

I think struct and map types are fairly different ``` dict[str, int] # map type {"foo": int, "bar": str} # struct type ``` Maps can have any number of keys/values,...