Jay Chia issues

Results 70 issues of


                                            Jay Chia

Fix df.count() behavior to perform count_rows instead

**Is your feature request related to a problem? Please describe.** When users run `df.count()`, they often expect `df.count_rows()` behavior. Instead, `df.count()` will perform a count aggregation on every column, which...

[FEAT] Allow for selection of append/overwrite/overwrite_partitions options when writing data

**Is your feature request related to a problem? Please describe.** When writing data, Daft currently performs an append by default. We should additionally provide options to: 1. Overwrite the entire...

Disallow glob paths with `**` that is not a folder segment

**Is your feature request related to a problem? Please describe.** Daft currently only correctly understands the `**` recursive wildcard when it is applied to a folder segment. I.e. * `s3://bucket/**.csv`...

good first issue

Global Expressions: improved Aggregation syntax

**Is your feature request related to a problem? Please describe.** Currently, Daft aggregation syntax is a little loose and is modelled after PyArrow. ```python df.agg([(col("a"), "agg-string")]) ``` This has a...

enhancement

module: query-plan-v2

[DOCS] Fix struct accessors in tutorial examples

documentation

[BUG] Bug with tqdm progress bar display when running in Ray client mode with remote cluster

**Describe the bug** When running on a remote cluster via Ray client, progress bars seem to be broken: ``` (SchedulerActor pid=180, ip=10.0.66.234) Exception in thread 0d287252-6ae5-445b-9b96-5e412af6ab5d: (SchedulerActor pid=180, ip=10.0.66.234) Traceback...

Jay Chia

Fix df.count() behavior to perform count_rows instead

[FEAT] Allow for selection of append/overwrite/overwrite_partitions options when writing data

Disallow glob paths with `**` that is not a folder segment

Global Expressions: improved Aggregation syntax

[DOCS] Fix struct accessors in tutorial examples

[BUG] Bug with tqdm progress bar display when running in Ray client mode with remote cluster

Add Ray compatibility tests against Ray `main` to every commit in Daft `main`

Generated Parquet files are extremely fragmented

Using Daft for offline featurization

[FEAT] Add feature-flagged logical optimization pass to split Project into ActorPoolProject