parquetjs icon indicating copy to clipboard operation
parquetjs copied to clipboard

invalid encoding: PLAIN_DICTIONARY

Open ekschro opened this issue 4 years ago • 7 comments

The Issue

When trying to read a test parquet file fetched from an s3 bucket, I get an invalid encoding: PLAIN_DICTIONARY error. This is after getting an invalid parquet version error multiple times due to corrupt files. So, I would think this is a sign that the file is being recognized as a parquet file and just not being read correctly. Is there anything I am not doing correctly?

The Code

(async () => {
  try {
    let reader = await parquet.ParquetReader.openFile('./fetched3.parquet');

    let cursor = reader.getCursor();

    let record = null;
    while (record = await cursor.next()) {
      console.log(record);
    }
  }
  catch(err) {
    console.error(err)
  }
})();

ekschro avatar Jun 30 '20 16:06 ekschro

I just realized that PLAIN and PLAIN_DICTIONARY are two different forms of encoding.

Are there any plans to support PLAIN_DICTIONARY encoding in the future?

ekschro avatar Jul 01 '20 20:07 ekschro

I would be interested in this too

zeitiger avatar Sep 23 '20 08:09 zeitiger

Hey @zeitiger - Did you ever find a work around for this?

ekschro avatar Oct 18 '21 19:10 ekschro

I'm also getting this error, trying to read a parquet file created by AWS Wrangler (aka AWS SDK Pandas), no solution yet

mattfysh avatar Oct 10 '22 01:10 mattfysh

any updates on this

hackermondev avatar Jun 10 '23 00:06 hackermondev

For your information the lib does not support RLE_DICTIONARY as well. The workaround was to reencode the file to PLAIN

valdo404 avatar May 14 '24 11:05 valdo404

Also it does not support float8 data types

valdo404 avatar May 14 '24 11:05 valdo404