Jay Chia issues

Results 70 issues of


                                            Jay Chia

[FEAT] Propagate buffer size during dataframe partition iteration

This limits the depth of the "pipeline" when Daft executes `df.iter_partitions` partitions. The main refactor made here is that our PhysicalPlan is no longer responsible for keeping track of progress...

documentation

enhancement

Support reading from Azure with adlsv2 managed identity

**Is your feature request related to a problem? Please describe.** In Azure, sometimes users may use Azure managed identity instead of just pure credentials. Daft should support this. This may...

Reading only the partition column from an Iceberg/Delta table fails

**Describe the bug** When table formats such as Iceberg and Delta Lake store the data for a partition column, they will strip the column from the actual Parquet data files...

bug

iceberg

Iceberg Partitioned Write support

- [ ] Ability to override only specific partitions in the table (instead of whole table) - [ ] Ability to write to a partitioned Iceberg Table

iceberg

Support partition evolution (old files having different partitoning schemes vs new files)

**Is your feature request related to a problem? Please describe.** Currently Daft makes an assumption that all files being retrieved from a given Iceberg table has the same partitioning: 1....

iceberg

Automatically use the Ray Runner if we detect that Ray has been initialized

**Is your feature request related to a problem? Please describe.** The current behavior is to log a warning, but we should perhaps just automatically use the Ray Runner if we...

bug

Fix poor performance on (local) Parquet files with many rowgroups

**Describe the bug** Daft's local Parquet reader is slow when reading Parquet files with many small rowgroups. The Polars Parquet writer currently writes files like that (attached a sample file...

bug

Add Expressions from Ibis

**Is your feature request related to a problem? Please describe.** We would like to add more expressions and kernels for functionality to eventually have parity with Ibis (https://ibis-project.org/reference/expression-numeric). **Generic** https://ibis-project.org/reference/expression-generic...

enhancement

good first issue

expression

Support reading from non-hierarchical Azure Blob Store storage accounts

**Is your feature request related to a problem? Please describe.** When reading certain Azure Blob Store storage accounts that are non-hierarchical, Daft fails with FileNotFound. See: https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-namespace More context: #1849