parquetjs icon indicating copy to clipboard operation
parquetjs copied to clipboard

Add parquet-mr test

Open ZJONSSON opened this issue 7 years ago • 5 comments

Here is a very basic example of how we can use dockerized parquet-tools (from parquet-mr) to test on travis whether files created by parquetjs can be read by parquet-mr (and therefore spark etc)

The basic test succeeds but more advanced tests fail. I will add a failing branch that we can use as a guide for fixing any errors.

image

ZJONSSON avatar Feb 28 '18 01:02 ZJONSSON

Here is a failing branch: https://github.com/ZJONSSON/parquetjs/tree/parquet-mr-fail Problems with the RLE encoding

image

ZJONSSON avatar Feb 28 '18 03:02 ZJONSSON

This PR has been rebased on https://github.com/ironSource/parquetjs/pull/57 to include fixes for RLE in dlevels and rlevels + more test added to verify that the results are correct as seen from parquet-mr

ZJONSSON avatar Mar 01 '18 02:03 ZJONSSON

I seem to be running into this issue as well. Are there any outstanding items on this PR that I might be able to help with to get it merged in?

justinsoliz avatar Jul 10 '18 17:07 justinsoliz

Do your problems go away when you use this branch? The only outstanding thing here is a code review afaik.

ZJONSSON avatar Jul 11 '18 00:07 ZJONSSON

NPM install per this comment does the trick for me: https://github.com/ironSource/parquetjs/issues/29#issuecomment-385808572

justinsoliz avatar Jul 11 '18 00:07 justinsoliz