explorer icon indicating copy to clipboard operation
explorer copied to clipboard

Proposal: Integration testing for CSV, IPC, NDJSON, Parquet

Open joshuataylor opened this issue 2 years ago • 3 comments

I think it would be worthwhile adding tests for CSV, IPC, NDJSON and Parquet, with all having the same data.

On the other hand, Apache supplies Arrow testing files and Parquet files which Arrow2 tests against (Arrow2 is used by polars). We could add these as submodules.

We could also just borrow the tests from Arrow2 for the various formats :o)

The use case for this is that we'll find any weird issues encoding into/out of Elixir, for example with https://github.com/elixir-nx/explorer/issues/283 . It'll ensure the datatypes work as expected, and if/when there are additional backends the new backend can ensure compatibility if all the tests for various formats pass.

I would be happy to write these initial tests.

joshuataylor avatar Jul 04 '22 12:07 joshuataylor

Definitely! :+1:

josevalim avatar Jul 04 '22 12:07 josevalim

For sure! Good idea.

cigrainger avatar Jul 04 '22 18:07 cigrainger

I am starting to implement tests for the CSV part, based on the arrow2 test suite, which turns out to be a good inspiration.

thbar avatar Oct 01 '22 10:10 thbar

We have started on this for CSV and Parquet. More will come as we add/remove features.

josevalim avatar Nov 04 '22 07:11 josevalim