parquetjs icon indicating copy to clipboard operation
parquetjs copied to clipboard

Cannot write a parquet file having a comma in one of its headers

Open bartero opened this issue 4 years ago • 1 comments

I came across a very unfortunate problem using this library.
Whenever I try to read a parquet file created with this same tool and containing a comma , in any of its headers. I get this error while await parquetReader.getCursor().next(): TypeError: Cannot read property 'fields' of undefined
stacktrace:

at ParquetSchema.findField (../../.yarn/cache/parquetjs-npm-0.11.2-9df3a54481-63137e17bc.zip/node_modules/parquetjs/lib/schema.js:35:22)
      at Object.exports.materializeRecords (../../.yarn/cache/parquetjs-npm-0.11.2-9df3a54481-63137e17bc.zip/node_modules/parquetjs/lib/shred.js:164:26)
      at ParquetCursor.next (../../.yarn/cache/parquetjs-npm-0.11.2-9df3a54481-63137e17bc.zip/node_modules/parquetjs/lib/reader.js:62:40)

I guess this is caused by a rather "unsafe" operation in the parquetjs/lib/schema.js file on line 28: path.split(",")

I would be very helpful for any help with this problem. Thank you

bartero avatar Oct 09 '20 09:10 bartero

I have prepared solution of this problem here: https://github.com/ironSource/parquetjs/pull/118. Please have a look! :-) I am not sure what is the relation between the ZJONSSON/parquetjs <-> ironSource/parquetjs.

bartero avatar Oct 14 '20 18:10 bartero