explorer
explorer copied to clipboard
Proposal: Integration testing for CSV, IPC, NDJSON, Parquet
I think it would be worthwhile adding tests for CSV, IPC, NDJSON and Parquet, with all having the same data.
On the other hand, Apache supplies Arrow testing files and Parquet files which Arrow2 tests against (Arrow2 is used by polars). We could add these as submodules.
We could also just borrow the tests from Arrow2 for the various formats :o)
The use case for this is that we'll find any weird issues encoding into/out of Elixir, for example with https://github.com/elixir-nx/explorer/issues/283 . It'll ensure the datatypes work as expected, and if/when there are additional backends the new backend can ensure compatibility if all the tests for various formats pass.
I would be happy to write these initial tests.
Definitely! :+1:
For sure! Good idea.
I am starting to implement tests for the CSV part, based on the arrow2 test suite, which turns out to be a good inspiration.
We have started on this for CSV and Parquet. More will come as we add/remove features.