parquetjs
parquetjs copied to clipboard
support stream
I would like to know is it possible to support the read stream?
This library is heavily based on streams (i.e. each read is essentially a stream read), however random access is of critical importance. While it's possible to construct parquet files that can be read sequentially spitting values out on the other end, it's not guaranteed to work for every parquet file, mainly because sections can be in arbitrary order (example: dictionary you need to decode values of particular column you are reading, might be located anywhere in the file)
yauzl (unzip lib) also requires random access to file contents and has a RandomAccessReader
class for it which allows one to pass in either file path, file descriptor or RandomAccessReader
Actually looks liken one can construct ParquetEnvelopeReader which using similar approach as in yauzl
and then call ParquetReader.openEnvelopeReader
with it. see:
https://github.com/ZJONSSON/parquetjs/blob/3e5d76b781bc9045c95ccfc087e7792480f85667/lib/reader.js#L95-L98
https://github.com/ZJONSSON/parquetjs/blob/3e5d76b781bc9045c95ccfc087e7792480f85667/lib/reader.js#L316-L331