Daft icon indicating copy to clipboard operation
Daft copied to clipboard

Distributed DataFrame for Python designed for the cloud, powered by Rust

Results 272 Daft issues
Sort by recently updated
recently updated
newest added

Currently agg_concat simply combines the strings without a delimiter so the alternative would be to first collect it as agg_list then do list.join with a delimiter but it would be...

I would like to get a minhash with alternative hash algorithms such as the first four bytes of SHA1 as implemented in https://github.com/bigcode-project/bigcode-dataset/blob/main/near_deduplication/minhash_deduplication_spark.py The deduplication rate is empirically much better...

When pyspark saves parquets to a folder on a partition, it creates folders of the partition=some_value. When I use daft to read_parquet the parent folder, I would like to get...

**Is your feature request related to a problem? Please describe.** - [ ] On the Docs homepage, add example tutorials for each of Data Engineering, Analytics and ML/AI training and...

https://github.com/Eventual-Inc/Daft/blob/main/src/daft-core/src/array/ops/groups.rs#L43

**Is your feature request related to a problem? Please describe.** I don't think we should use archaic python naming conventions to drive our DSL. Nearly all of our other functions...

**Is your feature request related to a problem? Please describe.** I want to flatten all columns in a struct into the top level. But it seems like I need to...

**Is your feature request related to a problem? Please describe.** for a column containing URLs, I'd like to parse them and extract relevant components **Describe the solution you'd like** ```py...

### Describe the bug I am trying to do this: ``` import daft df1 = daft.from_pydict({"a": [1, 2, 3], "b": ["foo", "bar", "baz"]}) df2 = daft.from_pydict({"a": [1, 2, 3], "c":...

bug
needs triage

Implement outer joins for Swordfish. (Yes, this PR is a little big. But: 1. at least tests run in CI now, so you don't need to just take my word...

enhancement