parquetjs icon indicating copy to clipboard operation
parquetjs copied to clipboard

Can write file on AWS S3

Open rkbsoftsolutions opened this issue 5 years ago • 9 comments

Actually I am using parquetjs in Meteor.js . I want to create a parquet data file .

ParquetWriter.openFile(schema, filePath) , I am getting below error.

W20191226-23:56:11.534(5.5)? (STDERR) (node:6898) UnhandledPromiseRejectionWarning: missing required field: assets W20191226-23:56:11.534(5.5)? (STDERR) (node:6898) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 14).

It seems related to path or permission issue. But instead of create local file .

Is it possible to upload paraqut file AWS S3?

rkbsoftsolutions avatar Dec 26 '19 18:12 rkbsoftsolutions

@staronline1985 hi, do you find the solution upload parquet to AWS S3

taozhiyuzhuo avatar Feb 26 '20 10:02 taozhiyuzhuo

Yes , I had found the solution to upload parquet to AWS S3 . But I am getting issue with large files. For Example : I have read JSON or CSV format file and convert into Parquet format. It keep all data in memory until unless close to parquet writer. It will not work for me with large file. My job was read file json file from S3 and convert into parquet format and upload again on S3.

rkbsoftsolutions avatar Feb 26 '20 16:02 rkbsoftsolutions

I think it should be stream based so read data as stream and convert and upload stream to S3

rkbsoftsolutions avatar Feb 26 '20 16:02 rkbsoftsolutions

@staronline1985 I have the same mission. But for now. it just needs me covert local CSV file to parquet and upload s3. But it needs to create a local parquet file and then readFileSync as a buffer to upload . I want to upload S3 directly , don't save local. How to do that?

taozhiyuzhuo avatar Feb 27 '20 10:02 taozhiyuzhuo

@staronline1985 I have the same mission. But for now. it just needs me covert local CSV file to parquet and upload s3. But it needs to create a local parquet file and then readFileSync as a buffer to upload . I want to upload S3 directly , don't save local. How to do that?

I am also doing same and waiting for parquetjs , if any possibility for same . Otherwise I will go with other repo.

rkbsoftsolutions avatar Feb 29 '20 08:02 rkbsoftsolutions

You need to use ParquetTransformer as mentioned in #76

muratcorlu avatar Jul 15 '20 22:07 muratcorlu

You need to use ParquetTransformer as mentioned in #76

Do you have an example for doing it?

govthamreddy avatar Nov 19 '20 12:11 govthamreddy

@govthamreddy do you have a working example of pushing it to s3?

aijazkhan81 avatar Aug 06 '21 06:08 aijazkhan81

Example from #76

https://github.com/ironSource/parquetjs/issues/76#issuecomment-1312158235

magno32 avatar Nov 11 '22 20:11 magno32