Fix round-trip write/read issues from Parquet/CSV/JSON
This long-running issue records any bugs found with a roundtrip write + read from formats such as Parquet and CSV.
Tests were added here: #1616
Parquet
- [ ] Reading FixedSizeList type is broken (looks like a bug in arrow2)
- [x] Reading timezone-aware columns is broken See: #1625
- [x] Reading multiple files with the Tensor type is broken
CSV
- [x] Reading the Decimal type from a CSV with floats in the column is broken See: https://github.com/Eventual-Inc/Daft/pull/1626
@jaychia the test here marked as skipped https://github.com/Eventual-Inc/Daft/blob/d2f28d6ea8dcd41a909a6ac0ea88603dafbd8e8b/tests/io/test_parquet_roundtrip.py#L107 is working fine, maybe it has been resolved.
@jaychia the test here marked as skipped
https://github.com/Eventual-Inc/Daft/blob/d2f28d6ea8dcd41a909a6ac0ea88603dafbd8e8b/tests/io/test_parquet_roundtrip.py#L107
is working fine, maybe it has been resolved.
Oh yes -- good point! Interesting, something seems to have fixed it in the meantime :)
Removed the skipping:
https://github.com/Eventual-Inc/Daft/pull/2024