parquetjs icon indicating copy to clipboard operation
parquetjs copied to clipboard

Cannot write more than once

Open balajiaruna opened this issue 6 years ago • 2 comments

I have specified the option of append mode, but fruits.parquet has only the first 2 row (apples & Oranges). What am I missing?

Thanks!

var opts = {flags: 'a'};

var writer = await parquet.ParquetWriter.openFile(schema, 'fruits.parquet', opts);

// append a few rows to the file await writer.appendRow({name: 'apples', quantity: 10, price: 2.5, date: new Date(), in_stock: true}); await writer.appendRow({name: 'oranges', quantity: 10, price: 2.5, date: new Date(), in_stock: true}); write.close();

writer = await parquet.ParquetWriter.openFile(schema, 'fruits.parquet', opts);

// append a few rows to the file await writer.appendRow({name: 'banana', quantity: 10, price: 2.5, date: new Date(), in_stock: true}); await writer.appendRow({name: 'peaches', quantity: 10, price: 2.5, date: new Date(), in_stock: true}); write.close();

balajiaruna avatar Apr 09 '19 02:04 balajiaruna

Appending to a parquet file is a little more complicated than specifying append flag on the file, as the file has a metadata and footer at the end of the file.

One way to do a pure append, is first read the metadata and then append manually to the file, and finalize by writing the updated metadata and the footer at the end. The old metadata would be essentially orphaned off.

ZJONSSON avatar Apr 10 '19 16:04 ZJONSSON

FWIW: I just grouped all the rows I needed for a particular parquet file into a custom data structure. Once built, I looped through that structure and appended to the parquet file within a single open/close block. Solved the problem of having to worry about appending via the parquetJS api.

alienintheheights avatar Aug 16 '19 19:08 alienintheheights