parquetjs
parquetjs copied to clipboard
Example code for a simple node stream + buffer
There isn't anything too obvious about writing to streams in the readme that I could see so thought it'd be useful for others too :)
@tobinbc Thank you very much for the example. We're working on a new version of parquetjs
which will support streams out of the box - this rewrite will take several months though, given the amount of time we have for this project atm. So I will gladly add your contribution to the docs in the meantime. Could you please sign the cla agreement here: https://github.com/ironSource/opensource-contributor-license-agreement
Thanks
Is there any way I could help with that rewrite, is it in a branch? Is there a process to contribute to it?
@dobesv we can certainly discuss it, can you email me?
For those wanting to read parquet files outside the file system, I've found that this fork provides a good example of extending the ParquetEnvelopeReader to read from different sources, namely reading from a Buffer, From S3, or from a URL.
https://github.com/LibertyDSNP/parquetjs/blob/v1.2.0/lib/reader.ts#L378
That code has slightly deviated from the original ParquetEnvelopeReader which can be found here:
https://github.com/ironSource/parquetjs/blob/v0.8.0/lib/reader.js#L191
But the big idea is mostly the same. If you provide implementations of these functions you can create your own custom ParquetEnvelopeReader.
/*
readFn: (offset: number, length: number) => Promise<Buffer>
close: () => void;
fileSize: number;
*/
const myReader = new ParquetEnvelopeReader(readFn, closeFn, fileStat.size);
I have yet to implement this myself, but it seems reasonable that this could be extended to support a generic NodeJs Readable Stream such as the one provided by BlobDownloadResponseParsed.readablestreambody used in @azure/storage-blob
Being able to use a generic ReadableStream would also be a solution to fix this issue and open up the possibility of interfacing with other cloud services: https://github.com/ironSource/parquetjs/issues/110