Daft icon indicating copy to clipboard operation
Daft copied to clipboard

Distributed DataFrame for Python designed for the cloud, powered by Rust

Results 272 Daft issues
Sort by recently updated
recently updated
newest added

- cheaper clones - thought it might make sense since `Metadata` is already `Arc`'d but I am not sure. - not sure this makes sense, please validate https://github.com/Eventual-Inc/Daft/blob/08ca9a4078e4506afc9b774bd2f073eee94a38d9/src/daft-schema/src/field.rs#L17

Would return the counts of each element in the lists like the pandas .value_counts() or numpy .unique(with_counts=True) functionality. **Example:** ``` df = daft.from_pydict({"a": [[1, 2, 2, 3, 3, 3], [1,...

1. Improves UDF documentation by adding API pages for `StatefulUDF` and `StatelessUDF`, with lots of docstrings and examples 2. Moves our `daft.udf` module to `daft.udfs` instead, which avoids a naming...

documentation

Bumps [reqwest](https://github.com/seanmonstar/reqwest) from 0.11.27 to 0.12.7. Release notes Sourced from reqwest's releases. v0.12.7 What's Changed Revert adding impl Service<http::Request<_>> for Client. Full Changelog: https://github.com/seanmonstar/reqwest/compare/v0.12.6...v0.12.7 v0.12.6 What's Changed Add support for...

dependencies
rust

This PR assigns each actor that is spun up by the Python ActorPoolProject access to only certain GPUs, based on its rank. TODO: We should handle chains of model inference...

enhancement

**Describe the bug** A clear and concise description of what the bug is. When attempting to read a local deltalake, daft will log multiple errors and attempt to retrieve S3...

**Is your feature request related to a problem? Please describe.** `shuffle_aggregation_default_partitions` already exists to set the number of partitions to some sane default for an entire job. It would be...

**Is your feature request related to a problem? Please describe.** we use a PR labeller with the following: ``` * [FEAT]: adds the `enhancement` label * [PERF]: adds the `performance`...

good first issue
github_actions

on https://www.getdaft.io/projects/docs/en/latest/faq/benchmarks.html, there is no date on when the benchmark was executed. Also no versions.

We should add the ability to write back to HF, this will let people iterate more easily. There is some code in the [Spark docs](https://huggingface.co/docs/hub/main/datasets-spark#write) to upload data in a...