Jay Chia

Results 70 issues of Jay Chia

**Is your feature request related to a problem? Please describe.** When users run `df.count()`, they often expect `df.count_rows()` behavior. Instead, `df.count()` will perform a count aggregation on every column, which...

p1

**Is your feature request related to a problem? Please describe.** When writing data, Daft currently performs an append by default. We should additionally provide options to: 1. Overwrite the entire...

p2

**Is your feature request related to a problem? Please describe.** Daft currently only correctly understands the `**` recursive wildcard when it is applied to a folder segment. I.e. * `s3://bucket/**.csv`...

good first issue

**Is your feature request related to a problem? Please describe.** Currently, Daft aggregation syntax is a little loose and is modelled after PyArrow. ```python df.agg([(col("a"), "agg-string")]) ``` This has a...

enhancement
module: query-plan-v2

**Describe the bug** When running on a remote cluster via Ray client, progress bars seem to be broken: ``` (SchedulerActor pid=180, ip=10.0.66.234) Exception in thread 0d287252-6ae5-445b-9b96-5e412af6ab5d: (SchedulerActor pid=180, ip=10.0.66.234) Traceback...

**Is your feature request related to a problem? Please describe.** We should ensure that Daft is compatible with the bleeding edge of Ray development We can do so by running...

Hi, I noticed that the generated Parquet files are extremely fragmented in terms of rowgroups. This likely indicates a bug/issue in the Polars Parquet writer, but definitely also affects the...

Hey folks! Wondering if there's any interest in leveraging Daft (www.getdaft.io) for offline featurization? We're built with native Ray integrations, so I thought that there would be some natural synergies...