parquet-java
parquet-java copied to clipboard
Cannot read parquet file that was generated from nanoparquet
Describe the bug, including details regarding any error messages, version, and platform.
When trying to use MessageColumnIO to get a record reader, the following error occurs:
Exception in thread "main" java.lang.UnsupportedOperationException
at org.apache.parquet.column.values.ValuesReader.readInteger(ValuesReader.java:178)
at org.apache.parquet.column.impl.ColumnReaderBase$ValuesReaderIntIterator.nextInt(ColumnReaderBase.java:830)
at org.apache.parquet.column.impl.ColumnReaderBase.checkRead(ColumnReaderBase.java:663)
at org.apache.parquet.column.impl.ColumnReaderBase.consume(ColumnReaderBase.java:801)
at org.apache.parquet.column.impl.ColumnReaderImpl.consume(ColumnReaderImpl.java:30)
at org.apache.parquet.column.impl.ColumnReaderImpl.<init>(ColumnReaderImpl.java:43)
at org.apache.parquet.column.impl.ColumnReadStoreImpl.getColumnReader(ColumnReadStoreImpl.java:80)
at org.apache.parquet.io.RecordReaderImplementation.<init>(RecordReaderImplementation.java:282)
at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:141)
at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:105)
at org.apache.parquet.filter2.compat.FilterCompat$NoOpFilter.accept(FilterCompat.java:180)
at org.apache.parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:105)
I expected that this would work. Pyarrow for example can read the file.
Sample file can be found here:
Nanoparquet can be found here:
https://github.com/r-lib/nanoparquet
Version: 1.14.3 Platform: Linux
Component(s)
Core
Thanks for reporting this issue! I can confirm that it has been reproduced on my side. Will take a look later.