vscode-data-preview icon indicating copy to clipboard operation
vscode-data-preview copied to clipboard

invalid encoding: PLAIN_DICTIONARY

Open EugeniuZ opened this issue 3 years ago • 2 comments

Hi,

The extension fails to load the attached parquet file (zipped as github doesn't accept .parquet files). I am able to read the plain file with pandas.

The error in "Runtime Status" is "invalid encoding: PLAIN_DICTIONARY".

Vscode version: 1.70.2 (running on Ubuntu 22.04) Extension version: v2.3.0 FJUL.zip

Regards, Eugeniu

EugeniuZ avatar Sep 06 '22 20:09 EugeniuZ

@EugeniuZ Data preview uses this TypeScript library for reading parquet data files:

https://github.com/kbajalc/parquets

At the time when Data Preview was created, it was one of the few libraries available to read parquet files without dependency on Python tools and toolchain.

Quite possible that library doesn't support plain dictionary encoding, as you have it in your parquet files.

New parquet-wasm library looks promising, and in order to resolve this issue, and enable loading of compressed parquet files too, I would need to switch parquet data provider to use better parquet TS/JS library.

RandomFractals avatar Sep 07 '22 10:09 RandomFractals

more info at: https://github.com/RandomFractals/vscode-data-preview/issues/316#issuecomment-1277766785

RandomFractals avatar Oct 13 '22 15:10 RandomFractals