parquetjs
parquetjs copied to clipboard
invalid encoding: PLAIN_DICTIONARY
The Issue
When trying to read a test parquet file fetched from an s3 bucket, I get an invalid encoding: PLAIN_DICTIONARY
error. This is after getting an invalid parquet version
error multiple times due to corrupt files. So, I would think this is a sign that the file is being recognized as a parquet file and just not being read correctly. Is there anything I am not doing correctly?
The Code
(async () => {
try {
let reader = await parquet.ParquetReader.openFile('./fetched3.parquet');
let cursor = reader.getCursor();
let record = null;
while (record = await cursor.next()) {
console.log(record);
}
}
catch(err) {
console.error(err)
}
})();
I just realized that PLAIN
and PLAIN_DICTIONARY
are two different forms of encoding.
Are there any plans to support PLAIN_DICTIONARY
encoding in the future?
I would be interested in this too
Hey @zeitiger - Did you ever find a work around for this?
I'm also getting this error, trying to read a parquet file created by AWS Wrangler (aka AWS SDK Pandas), no solution yet
any updates on this
For your information the lib does not support RLE_DICTIONARY as well. The workaround was to reencode the file to PLAIN
Also it does not support float8 data types