parquet-java icon indicating copy to clipboard operation
parquet-java copied to clipboard

Cannot read parquet file that was generated from nanoparquet

Open RealTYPICAL opened this issue 1 year ago • 2 comments

Describe the bug, including details regarding any error messages, version, and platform.

When trying to use MessageColumnIO to get a record reader, the following error occurs:

Exception in thread "main" java.lang.UnsupportedOperationException
	at org.apache.parquet.column.values.ValuesReader.readInteger(ValuesReader.java:178)
	at org.apache.parquet.column.impl.ColumnReaderBase$ValuesReaderIntIterator.nextInt(ColumnReaderBase.java:830)
	at org.apache.parquet.column.impl.ColumnReaderBase.checkRead(ColumnReaderBase.java:663)
	at org.apache.parquet.column.impl.ColumnReaderBase.consume(ColumnReaderBase.java:801)
	at org.apache.parquet.column.impl.ColumnReaderImpl.consume(ColumnReaderImpl.java:30)
	at org.apache.parquet.column.impl.ColumnReaderImpl.<init>(ColumnReaderImpl.java:43)
	at org.apache.parquet.column.impl.ColumnReadStoreImpl.getColumnReader(ColumnReadStoreImpl.java:80)
	at org.apache.parquet.io.RecordReaderImplementation.<init>(RecordReaderImplementation.java:282)
	at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:141)
	at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:105)
	at org.apache.parquet.filter2.compat.FilterCompat$NoOpFilter.accept(FilterCompat.java:180)
	at org.apache.parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:105)

I expected that this would work. Pyarrow for example can read the file.

Sample file can be found here:

mtcars_np.zip

Nanoparquet can be found here:

https://github.com/r-lib/nanoparquet

Version: 1.14.3 Platform: Linux

Component(s)

Core

RealTYPICAL avatar Nov 06 '24 15:11 RealTYPICAL

Thanks for reporting this issue! I can confirm that it has been reproduced on my side. Will take a look later.

wgtmac avatar Nov 15 '24 14:11 wgtmac