parquetjs icon indicating copy to clipboard operation
parquetjs copied to clipboard

How to upload the parquet file to s3?

Open aijazkhan81 opened this issue 3 years ago • 3 comments

var writer = await parquet.ParquetWriter.openFile(schema, 'fruits.parquet');
await writer.appendRow({name: 'apples', quantity: 10, price: 2.5, date: new Date(), in_stock: true});
await writer.appendRow({name: 'oranges', quantity: 10, price: 2.5, date: new Date(), in_stock: true});
await writer.close();

I have done this part, and the file gets saved locally. How to attach the file to a variable? If I can attach it to a variable, it will be easier to upload the file.

aijazkhan81 avatar Aug 06 '21 07:08 aijazkhan81

If it can help you, I managed to do it by using pure stream, but I don't know if appendRow is compatible with stream mode:

  1. I receive my stream from a request, here I will name it sourceStream (you need to create your own Readable stream I guess)
  2. I create a ParquetTransformer, here I will name it parquetStream and pipe it to sourceStream
  3. I create an AWS S3 putObjectRequest with the stream as Body, using the official AWS SDK
// say we already have sourceStream and `parquetStream`
s3Bucket.upload({
  Bucket: 'bucketName',
  Key: 'path/of/the/file',
  Body: sourceStream.pipe(parquetStream)
}); // here I do a .promise() but this is for my usage

Using stream can have the advantage of saving RAM.

sambonbonne avatar Aug 23 '21 14:08 sambonbonne

@sambonbonne, my friend!

Can you put a more complete example, please!

I don't understand anything about stream.

Thank you very much!

eliasrosa avatar Aug 25 '21 04:08 eliasrosa

@eliasrosa I'm sorry, I don't know how to make a more complete example. I can add some variables or something but I'm not sure it will help:

const parquetStream = new ParquetTransformer({ /* your parquet and transform parameters */ });

// saying you already have a Readable source stream as sourceStream
const conversionStream = sourceStream.pipe(parquetStream);

s3Bucket.upload({
  Bucket: 'bucketName',
  Key: 'path/of/the/file',
  Body: conversionStream
});

I don't want to discourage you but I think you should not try to use streams without understanding those. Streams are important in NodeJS and have multiple advantages, maybe learning more about streams would be useful for you if you use NodeJS.

(I hope you won't take this answer as an attack, I just don't know how I can help better)

sambonbonne avatar Aug 25 '21 16:08 sambonbonne