Daft
Daft copied to clipboard
Distributed DataFrame for Python designed for the cloud, powered by Rust
**Is your feature request related to a problem? Please describe.** Currently there is an optimization inlined inside [translate_single_logical_node](https://github.com/universalmind303/Daft/blob/158291c66e03be9b2252a428f826b6e78c2fb30a/src/daft-plan/src/physical_planner/translate.rs#L52). I think it'd make the code a bit easier to reason about...
**Is your feature request related to a problem? Please describe.** A lot of sources support reading a range instead of just a limit. **Describe the solution you'd like** Modify the...
**Is your feature request related to a problem? Please describe.** If a plan contains nested unions/concats, we can instead flatten those to a single operation Example: ```py df.concat(df.concat(df.concat(df)).explain(True) ``` which...
**Is your feature request related to a problem? Please describe.** Most other rule based execution engines have some form of expression simplification. Some common optimizations - inline constant expressions such...
**Is your feature request related to a problem? Please describe.** I'd like to perform various distance/similarity functions. - [ ] `cosine` - [ ] `dot_product` - [ ] `euclidian` -...
**Is your feature request related to a problem? Please describe.** the optimizer should remove identical subplans used in different parts of the plan Example: ```py df.select( col('a').str.contains('foo').alias('a_contains_foo'), col('a').str.contains('foo').alias('a_contains_foo2') ) ```...
This would involve spinning up a Ray cluster in a container that Daft would connect to. Several paths to test: - [ ] `ray.init()` not called, `daft.context.set_runner_ray()` called with local...
In #2393 we added three additional code paths for authenticating to Google Cloud: credentials file path, credentials string, and oauth2 token. It would be good to write integration tests for...
**Is your feature request related to a problem? Please describe.** I was trying to do something simple like read from a public s3 bucket ```py daft.read_lance('s3://daft-public-data/lance/words-test-dataset').collect() ``` but it took...
**Describe the bug** Getting the following error when calling write_deltalake File /opt/conda/lib/python3.11/site-packages/daft/table/table_io.py:691, in write_deltalake..file_visitor(written_file) 689 def file_visitor(written_file: Any) -> None: 690 path, partition_values = get_partitions_from_path(written_file.path) --> 691 stats = get_file_stats_from_metadata(written_file.metadata)...