Daft
Daft copied to clipboard
Distributed DataFrame for Python designed for the cloud, powered by Rust
- cheaper clones - thought it might make sense since `Metadata` is already `Arc`'d but I am not sure. - not sure this makes sense, please validate https://github.com/Eventual-Inc/Daft/blob/08ca9a4078e4506afc9b774bd2f073eee94a38d9/src/daft-schema/src/field.rs#L17
Would return the counts of each element in the lists like the pandas .value_counts() or numpy .unique(with_counts=True) functionality. **Example:** ``` df = daft.from_pydict({"a": [[1, 2, 2, 3, 3, 3], [1,...
1. Improves UDF documentation by adding API pages for `StatefulUDF` and `StatelessUDF`, with lots of docstrings and examples 2. Moves our `daft.udf` module to `daft.udfs` instead, which avoids a naming...
Bumps [reqwest](https://github.com/seanmonstar/reqwest) from 0.11.27 to 0.12.7. Release notes Sourced from reqwest's releases. v0.12.7 What's Changed Revert adding impl Service<http::Request<_>> for Client. Full Changelog: https://github.com/seanmonstar/reqwest/compare/v0.12.6...v0.12.7 v0.12.6 What's Changed Add support for...
This PR assigns each actor that is spun up by the Python ActorPoolProject access to only certain GPUs, based on its rank. TODO: We should handle chains of model inference...
**Describe the bug** A clear and concise description of what the bug is. When attempting to read a local deltalake, daft will log multiple errors and attempt to retrieve S3...
**Is your feature request related to a problem? Please describe.** `shuffle_aggregation_default_partitions` already exists to set the number of partitions to some sane default for an entire job. It would be...
**Is your feature request related to a problem? Please describe.** we use a PR labeller with the following: ``` * [FEAT]: adds the `enhancement` label * [PERF]: adds the `performance`...
on https://www.getdaft.io/projects/docs/en/latest/faq/benchmarks.html, there is no date on when the benchmark was executed. Also no versions.
We should add the ability to write back to HF, this will let people iterate more easily. There is some code in the [Spark docs](https://huggingface.co/docs/hub/main/datasets-spark#write) to upload data in a...