parquet-java icon indicating copy to clipboard operation
parquet-java copied to clipboard

PARQUET-246: Support reading data written while PARQUET-246 was active

Open isnotinvain opened this issue 9 years ago • 1 comments

This is a re-worked version of #217 from @spena, that tries to generalize the fix a little bit, and doesn't cache the previous value across columns or row groups.

I am working on building some tests that actually verify that this works, I think in addition to unit tests this needs an integration test to run over a corrupted file, ideally one that also throws dictionary encoding into the mix, because we can have a mix of encodings in a row group.

I think we need to also confirm that this corruption can't cross row group boundaries, otherwise this becomes even more complicated to recover from.

isnotinvain avatar Jun 30 '15 06:06 isnotinvain

@isnotinvain, I think this was fixed by a later PR and can be closed.

rdblue avatar Oct 30 '15 23:10 rdblue