Daft icon indicating copy to clipboard operation
Daft copied to clipboard

Distributed DataFrame for Python designed for the cloud, powered by Rust

Results 272 Daft issues
Sort by recently updated
recently updated
newest added

**Is your feature request related to a problem? Please describe.** Currently there is an optimization inlined inside [translate_single_logical_node](https://github.com/universalmind303/Daft/blob/158291c66e03be9b2252a428f826b6e78c2fb30a/src/daft-plan/src/physical_planner/translate.rs#L52). I think it'd make the code a bit easier to reason about...

**Is your feature request related to a problem? Please describe.** A lot of sources support reading a range instead of just a limit. **Describe the solution you'd like** Modify the...

**Is your feature request related to a problem? Please describe.** If a plan contains nested unions/concats, we can instead flatten those to a single operation Example: ```py df.concat(df.concat(df.concat(df)).explain(True) ``` which...

**Is your feature request related to a problem? Please describe.** Most other rule based execution engines have some form of expression simplification. Some common optimizations - inline constant expressions such...

performance

**Is your feature request related to a problem? Please describe.** I'd like to perform various distance/similarity functions. - [ ] `cosine` - [ ] `dot_product` - [ ] `euclidian` -...

**Is your feature request related to a problem? Please describe.** the optimizer should remove identical subplans used in different parts of the plan Example: ```py df.select( col('a').str.contains('foo').alias('a_contains_foo'), col('a').str.contains('foo').alias('a_contains_foo2') ) ```...

This would involve spinning up a Ray cluster in a container that Daft would connect to. Several paths to test: - [ ] `ray.init()` not called, `daft.context.set_runner_ray()` called with local...

In #2393 we added three additional code paths for authenticating to Google Cloud: credentials file path, credentials string, and oauth2 token. It would be good to write integration tests for...

**Is your feature request related to a problem? Please describe.** I was trying to do something simple like read from a public s3 bucket ```py daft.read_lance('s3://daft-public-data/lance/words-test-dataset').collect() ``` but it took...

documentation
good first issue

**Describe the bug** Getting the following error when calling write_deltalake File /opt/conda/lib/python3.11/site-packages/daft/table/table_io.py:691, in write_deltalake..file_visitor(written_file) 689 def file_visitor(written_file: Any) -> None: 690 path, partition_values = get_partitions_from_path(written_file.path) --> 691 stats = get_file_stats_from_metadata(written_file.metadata)...