parquetjs icon indicating copy to clipboard operation
parquetjs copied to clipboard

support stream

Open luckstar77 opened this issue 4 years ago • 3 comments

I would like to know is it possible to support the read stream?

luckstar77 avatar Apr 21 '20 03:04 luckstar77

This library is heavily based on streams (i.e. each read is essentially a stream read), however random access is of critical importance. While it's possible to construct parquet files that can be read sequentially spitting values out on the other end, it's not guaranteed to work for every parquet file, mainly because sections can be in arbitrary order (example: dictionary you need to decode values of particular column you are reading, might be located anywhere in the file)

ZJONSSON avatar Apr 21 '20 14:04 ZJONSSON

yauzl (unzip lib) also requires random access to file contents and has a RandomAccessReader class for it which allows one to pass in either file path, file descriptor or RandomAccessReader

safareli avatar Apr 26 '21 15:04 safareli

Actually looks liken one can construct ParquetEnvelopeReader which using similar approach as in yauzl and then call ParquetReader.openEnvelopeReader with it. see:

https://github.com/ZJONSSON/parquetjs/blob/3e5d76b781bc9045c95ccfc087e7792480f85667/lib/reader.js#L95-L98

https://github.com/ZJONSSON/parquetjs/blob/3e5d76b781bc9045c95ccfc087e7792480f85667/lib/reader.js#L316-L331

safareli avatar Apr 26 '21 15:04 safareli