Daft issues

Field name use `Arc<str>` instead of `String`

3

- cheaper clones - thought it might make sense since `Metadata` is already `Arc`'d but I am not sure. - not sure this makes sense, please validate https://github.com/Eventual-Inc/Daft/blob/08ca9a4078e4506afc9b774bd2f073eee94a38d9/src/daft-schema/src/field.rs#L17

andrewgazelka

Add a `.list.value_counts()` expression

4

Would return the counts of each element in the lists like the pandas .value_counts() or numpy .unique(with_counts=True) functionality. **Example:** ``` df = daft.from_pydict({"a": [[1, 2, 2, 3, 3, 3], [1,...

MisterKloudy

[DOCS] Changing docs for UDF

5

1. Improves UDF documentation by adding API pages for `StatefulUDF` and `StatelessUDF`, with lots of docstrings and examples 2. Moves our `daft.udf` module to `daft.udfs` instead, which avoids a naming...

jaychia

documentation

Bump reqwest from 0.11.27 to 0.12.7

4

Bumps [reqwest](https://github.com/seanmonstar/reqwest) from 0.11.27 to 0.12.7. Release notes Sourced from reqwest's releases. v0.12.7 What's Changed Revert adding impl Service<http::Request<_>> for Client. Full Changelog: https://github.com/seanmonstar/reqwest/compare/v0.12.6...v0.12.7 v0.12.6 What's Changed Add support for...

dependabot[bot]

dependencies

rust

[FEAT] Assign CUDA_VISIBLE_DEVICES to actor pools in PyRunner

2

This PR assigns each actor that is spun up by the Python ActorPoolProject access to only certain GPUs, based on its rank. TODO: We should handle chains of model inference...

jaychia

enhancement

`read_deltalake` attempts to use S3 credentials for local files

2

**Describe the bug** A clear and concise description of what the bug is. When attempting to read a local deltalake, daft will log multiple errors and attempt to retrieve S3...

apostolos-geyer

enable setting default number of partitions to use for joins in `DaftContext`

3

**Is your feature request related to a problem? Please describe.** `shuffle_aggregation_default_partitions` already exists to set the number of partitions to some sane default for an entire job. It would be...

gmweaver

switch to conventional commits for PR labeller

1

**Is your feature request related to a problem? Please describe.** we use a PR labeller with the following: ``` * [FEAT]: adds the `enhancement` label * [PERF]: adds the `performance`...

universalmind303

good first issue

github_actions

There is no date on when the benchmark was executed

1

on https://www.getdaft.io/projects/docs/en/latest/faq/benchmarks.html, there is no date on when the benchmark was executed. Also no versions.

alberttwong

write directly to huggingface

We should add the ability to write back to HF, this will let people iterate more easily. There is some code in the [Spark docs](https://huggingface.co/docs/hub/main/datasets-spark#write) to upload data in a...

universalmind303

Daft
Daft copied to clipboard

Metadata

Field name use `Arc<str>` instead of `String`

Add a `.list.value_counts()` expression

[DOCS] Changing docs for UDF

Bump reqwest from 0.11.27 to 0.12.7

[FEAT] Assign CUDA_VISIBLE_DEVICES to actor pools in PyRunner

`read_deltalake` attempts to use S3 credentials for local files

enable setting default number of partitions to use for joins in `DaftContext`

switch to conventional commits for PR labeller

There is no date on when the benchmark was executed

write directly to huggingface

← Metadata

Owner

Metadata

Daft Daft copied to clipboard

Metadata

← Metadata

Owner

Metadata

Daft
Daft copied to clipboard