Jay Chia issues

Results 70 issues of


                                            Jay Chia

[CHORE] Consolidate DataType.from_arrow logic into Rust

Closes #1958

chore

Syntactic sugar for nested getting in column names

**Is your feature request related to a problem? Please describe.** When retrieving nested columns in structs, we currently rely on the `Expression.struct.get(...)` accessor. However, for deeply nested structs this may...

Consolidate DataType inference from arrow types

Our DataType inference from arrow types happen in separate places right now for the Python path and Rust path We should consolidate this for more predictable behavior

Window function support

**Is your feature request related to a problem? Please describe.** Windows functions: functions that are applied over **windows** of data. Here is a great illustration from DuckDB: ![image](https://github.com/Eventual-Inc/Daft/assets/17691182/22271273-f0ec-4829-a9c5-8a4d18adedf7) Valid expressions...

Pivot/Unpivot functionality

**Is your feature request related to a problem? Please describe.** **Pivot**: Converts rows into columns **Unpivot** : Converts columns into rows **Tasks** - [ ] Pivot with explicit values passed...

[DOCS] Add feature-of-the-week IOConfigs

documentation

[Feature] Add Expression helper to fill NA

# Summary It is useful for convenience to have a `.fillna` function to fill all null/NaN values in a column ## Proposal ``` df["x"].float.fillnan(0.5) df["x"].fillnull(0.5) df["x"].fillna(0.5) ``` The above expressions...

enhancement

good first issue

expression

[EPIC] Better code modularization

This Epic tracks issues relating to better code modularization within Daft. - [x] #1131 - [x] #1132 - [x] #1173

modularity

Fix round-trip write/read issues from Parquet/CSV/JSON

This long-running issue records any bugs found with a roundtrip write + read from formats such as Parquet and CSV. Tests were added here: #1616 ### Parquet - [ ]...

bug

good first issue

Write a guide on partitioning

Write a guide to enumerate key concepts around partitioning: ``` Increasing the number of partitions in your DataFrame has the following effects: 1. Increase the amount of parallelism available to...

documentation

data-catalogs