Daft
Daft copied to clipboard
Distributed DataFrame for Python designed for the cloud, powered by Rust
Streaming writes for swordfish (parquet + csv only). Iceberg and delta writes are here: https://github.com/Eventual-Inc/Daft/pull/2966 **Unpartitioned writes:** 1. Spawn NUM_CPU workers that are responsible for making write calls. 2. A...
Fix for #2878 Not sure how feasible it is to unit test this since it would likely require access to a GPU. However I tested this on a GPU machine...
**Is your feature request related to a problem? Please describe.** apache arrow has an interval type that would be helpful for various date operations. Unlike `duration`, which is an absolute...
This PR implements iceberg MOR for streaming parquet reads. Currently the tests are failing for test cases with empty scans, which is why this PR depends on: https://github.com/Eventual-Inc/Daft/pull/2918, which implements...
**Is your feature request related to a problem? Please describe.** I want to perform cross joins using daft **Describe the solution you'd like** `df1.join(df2, how='cross')` **Describe alternatives you've considered** `df1.join(df2,...
Addresses: https://github.com/Eventual-Inc/Daft/issues/2808 This PR enables adding file path as a column from file reads via the `file_path_column: str | None` parameter. This works by appending a column of the file...
**Is your feature request related to a problem? Please describe.** I want to be able to cast a `timedelta` to a duration the same way we implicitly convert it if...
### Describe the bug unable to perform simple date math such as `col("date") + '1y'` ### To Reproduce ```py df = daft.from_pydict({ 'date' : ['2021-01-01', '2021-01-02', '2021-01-03'], }).select(daft.col("date").cast(daft.DataType.date())) df.select(daft.col("date") +...
**Describe the bug** A clear and concise description of what the bug is. **To Reproduce** ```py df = (daft.from_pydict({ 'floats': [328.00, 327.00] }) .where(col('floats').cast(daft.DataType.decimal128(15, 2)) > 300) .collect() ) ---------------------------------------------------------------------------...
### Describe the bug testing tpch queries using daft, ### To Reproduce https://colab.research.google.com/drive/1hIsswquloAd_7E0UzsF6chdyuIPNo081#scrollTo=kQeSmaq5MqRD ### Expected behavior even if it is not supported, I don't expect a crash ### Component(s) SQL...