Daft
Daft copied to clipboard
Distributed DataFrame for Python designed for the cloud, powered by Rust
Our current search isn't very good. Use something like algolia to do search and maybe search as you type.
Take the cubic root of a numeric column. `df["a"].cbrt()` See https://github.com/Eventual-Inc/Daft/pull/2180 for a reference PR.
## Features - [x] Basic reads - [x] Partitioned reads - [x] Basic writes - [ ] https://github.com/Eventual-Inc/Daft/issues/1954 - [Blocked by delta-rs](https://github.com/delta-io/delta-rs/issues/1094) - [x] Utilizing statistics from table metadata for...
**Is your feature request related to a problem? Please describe.** I want to write to a single parquet file **Describe the solution you'd like** ```py daft.read_parquet("./my_file.parquet").write_parquet('my_file_new.parquet') ``` Currently this writes...
**Describe the bug** I am trying to use the write_deltalake function with where clause for a Timestamp comparison. **To Reproduce** Steps to reproduce the behavior: ``` df = ( daft.read_deltalake("abfss://[email protected]/yy")...
We need more tests that cover - [ ] Writing and then reading data (consistency) - [ ] Reading and then writing data (complete workflow) - [ ] More tests...
We currently skip some of our Iceberg tests ([here](https://github.com/Eventual-Inc/Daft/blob/de1a9a0c0052aa30f8bae3675226d0abb4ab61d7/tests/integration/iceberg/test_pyiceberg_written_table_load.py#L60) and [here](https://github.com/Eventual-Inc/Daft/blob/main/tests/io/iceberg/test_iceberg_writes.py#L118)) that do a write then a read because the functionality does not work. The underlying issue seems to be...
Add support for floor division of integers (`//` operator). Currently you would have to do true division, floor it, then cast from float to int. This is unwieldy and may...
Follow-up from #2448 We can potentially provide a better value for `ScanTask::estimate_in_memory_size_bytes` when metadata hasn't been read by using the estimate from files where there is metadata to determine an...
The function `expr_has_agg` in [src/daft-dsl/src/expr.rs](https://github.com/Eventual-Inc/Daft/pull/2367/files/026ce5a6d7e51ea1cc327922ea89ce1a7c222cd5#diff-fb7a328d33045d9e3f6280478b2d0160f49291ea84bd0afceb951f4ee5a78e84) currently traverses the expression by matching on all the expression types. We could use a tree visitor pattern instead to simplify it.