Daft icon indicating copy to clipboard operation
Daft copied to clipboard

Fix round-trip write/read issues from Parquet/CSV/JSON

Open jaychia opened this issue 2 years ago • 2 comments

This long-running issue records any bugs found with a roundtrip write + read from formats such as Parquet and CSV.

Tests were added here: #1616

Parquet

  • [ ] Reading FixedSizeList type is broken (looks like a bug in arrow2)
  • [x] Reading timezone-aware columns is broken See: #1625
  • [x] Reading multiple files with the Tensor type is broken

CSV

  • [x] Reading the Decimal type from a CSV with floats in the column is broken See: https://github.com/Eventual-Inc/Daft/pull/1626

jaychia avatar Nov 16 '23 21:11 jaychia

@jaychia the test here marked as skipped https://github.com/Eventual-Inc/Daft/blob/d2f28d6ea8dcd41a909a6ac0ea88603dafbd8e8b/tests/io/test_parquet_roundtrip.py#L107 is working fine, maybe it has been resolved.

murex971 avatar Mar 19 '24 17:03 murex971

@jaychia the test here marked as skipped

https://github.com/Eventual-Inc/Daft/blob/d2f28d6ea8dcd41a909a6ac0ea88603dafbd8e8b/tests/io/test_parquet_roundtrip.py#L107

is working fine, maybe it has been resolved.

Oh yes -- good point! Interesting, something seems to have fixed it in the meantime :)

Removed the skipping:

https://github.com/Eventual-Inc/Daft/pull/2024

jaychia avatar Mar 19 '24 21:03 jaychia