parquetjs
parquetjs copied to clipboard
How to upload the parquet file to s3?
var writer = await parquet.ParquetWriter.openFile(schema, 'fruits.parquet');
await writer.appendRow({name: 'apples', quantity: 10, price: 2.5, date: new Date(), in_stock: true});
await writer.appendRow({name: 'oranges', quantity: 10, price: 2.5, date: new Date(), in_stock: true});
await writer.close();
I have done this part, and the file gets saved locally. How to attach the file to a variable? If I can attach it to a variable, it will be easier to upload the file.
If it can help you, I managed to do it by using pure stream, but I don't know if appendRow
is compatible with stream mode:
- I receive my stream from a request, here I will name it
sourceStream
(you need to create your ownReadable
stream I guess) - I create a
ParquetTransformer
, here I will name itparquetStream
and pipe it tosourceStream
- I create an AWS S3
putObjectRequest
with the stream asBody
, using the official AWS SDK
// say we already have sourceStream and `parquetStream`
s3Bucket.upload({
Bucket: 'bucketName',
Key: 'path/of/the/file',
Body: sourceStream.pipe(parquetStream)
}); // here I do a .promise() but this is for my usage
Using stream can have the advantage of saving RAM.
@sambonbonne, my friend!
Can you put a more complete example, please!
I don't understand anything about stream.
Thank you very much!
@eliasrosa I'm sorry, I don't know how to make a more complete example. I can add some variables or something but I'm not sure it will help:
const parquetStream = new ParquetTransformer({ /* your parquet and transform parameters */ });
// saying you already have a Readable source stream as sourceStream
const conversionStream = sourceStream.pipe(parquetStream);
s3Bucket.upload({
Bucket: 'bucketName',
Key: 'path/of/the/file',
Body: conversionStream
});
I don't want to discourage you but I think you should not try to use streams without understanding those. Streams are important in NodeJS and have multiple advantages, maybe learning more about streams would be useful for you if you use NodeJS.
(I hope you won't take this answer as an attack, I just don't know how I can help better)