Daft icon indicating copy to clipboard operation
Daft copied to clipboard

Distributed DataFrame for Python designed for the cloud, powered by Rust

Results 272 Daft issues
Sort by recently updated
recently updated
newest added

Closes #1768 This is a POC for adding overwrite / overwrite partitions mode for our write methods. The idea is to collect all the file paths that were written across...

enhancement

Write a guide to enumerate key concepts around partitioning: ``` Increasing the number of partitions in your DataFrame has the following effects: 1. Increase the amount of parallelism available to...

documentation
data-catalogs

**Describe the bug** If a task crashes during a write on append mode, it will restart and write all the files again, leaving behind dirty files. **To Reproduce** Steps to...

**Is your feature request related to a problem? Please describe.** When users run `df.count()`, they often expect `df.count_rows()` behavior. Instead, `df.count()` will perform a count aggregation on every column, which...

p1

User-defined global expressions, similar to typical UDFs, are Python functions that users can use as expressions. However, what is different about global expressions is that they produce a value with...

Additional expressions: - [ ] concat - [ ] collect_list - [ ] collect_set - same as collect_list but no duplicates - [ ] distinct - Special in that it’s...

`DataFrame.groupby` should correctly accept list expressions. Expected behavior: ```python >>> df = daft.from_pydict({ ... "strings": ["a", "b", "c", "d"], ... "lists": [[1, 1, 1, 1], [1, 1, 1, 1], [2,...

Hey - so this might not be on the roadmap for Daft at all, but I thought it was worth asking about! Also, just to say, thanks for building this...

allow for regex in expressions. For example to select all expressions that start with `c` ``` df.select(col("c*")) ``` flatten a struct `c` ``` df.select(col("c.*")) ``` See: https://github.com/Eventual-Inc/Daft/discussions/1964

In an effort to mitigate a max protobuf size (> 2 GB) error in Ray, we currently pass reduce task inputs as a list of object refs and `ray.get()` them...

performance
tech-debt
p3