parquetjs icon indicating copy to clipboard operation
parquetjs copied to clipboard

Cannot write a parquet file having a comma in one of its headers

Open bartero opened this issue 4 years ago • 1 comments

I came across a very unfortunate problem using this library.
Whenever I try to read a parquet file created with this same tool and containing a comma , in any of its headers. I get this error while await parquetReader.getCursor().next(): TypeError: Cannot read property 'fields' of undefined
stacktrace:

at ParquetSchema.findField (../../.yarn/cache/parquetjs-npm-0.11.2-9df3a54481-63137e17bc.zip/node_modules/parquetjs/lib/schema.js:35:22)
      at Object.exports.materializeRecords (../../.yarn/cache/parquetjs-npm-0.11.2-9df3a54481-63137e17bc.zip/node_modules/parquetjs/lib/shred.js:164:26)
      at ParquetCursor.next (../../.yarn/cache/parquetjs-npm-0.11.2-9df3a54481-63137e17bc.zip/node_modules/parquetjs/lib/reader.js:62:40)

I guess this is caused by a rather "unsafe" operation in the parquetjs/lib/schema.js file on line 28: path.split(",")

I would be very helpful for any help with this problem. Thank you!

bartero avatar Oct 09 '20 10:10 bartero

Hi! I have written a solution for it! Look below :-)

bartero avatar Oct 14 '20 16:10 bartero