parquet-java
parquet-java copied to clipboard
PARQUET-246: Support reading data written while PARQUET-246 was active
This is a re-worked version of #217 from @spena, that tries to generalize the fix a little bit, and doesn't cache the previous value across columns or row groups.
I am working on building some tests that actually verify that this works, I think in addition to unit tests this needs an integration test to run over a corrupted file, ideally one that also throws dictionary encoding into the mix, because we can have a mix of encodings in a row group.
I think we need to also confirm that this corruption can't cross row group boundaries, otherwise this becomes even more complicated to recover from.
@isnotinvain, I think this was fixed by a later PR and can be closed.