parquetjs
parquetjs copied to clipboard
[NodeJS] RangeError [ERR_OUT_OF_RANGE] when reading a parquet file
Unable to read a parquet file if it contains multiple lines using parquet reader results in RangeError [ERR_OUT_OF_RANGE]: The value of "offset" is out of range. It must be >= 0 and <= 79. Received 604307758(Error stack attached below)
If the parquet file contains only 1 record, then it works fine.
"parquetjs": "^0.11.2", Node Version: v19.0.1 NPM version : 8.19.2 Attached parquet files with this thread
Archive.zip Test Script:
import parquetjs from 'parquetjs';
const { ParquetReader } = parquetjs;
async function readParquetFile() {
const reader = await ParquetReader.openFile('doesntwork.parquet');
const cursor = reader.getCursor();
let record = '';
while (record !== undefined) {
record = await cursor.next();
console.log(">>RECORD",record);
if (!record) {
break;
}
}
}
readParquetFile()
node src/operations/test.js
/test/node_modules/brotli/build/encode.js:3
1<process.argv.length?process.argv[1].replace(/\\/g,"/"):"unknown-program");b.arguments=process.argv.slice(2);"undefined"!==typeof module&&(module.exports=b);process.on("uncaughtException",function(a){if(!(a instanceof y))throw a;});b.inspect=function(){return"[Emscripten Module object]"}}else if(x)b.print||(b.print=print),"undefined"!=typeof printErr&&(b.printErr=printErr),b.read="undefined"!=typeof read?read:function(){throw"no read() available (jsc?)";},b.readBinary=function(a){if("function"===
^
RangeError [ERR_OUT_OF_RANGE]: The value of "offset" is out of range. It must be >= 0 and <= 79. Received 604307758
at new NodeError (node:internal/errors:393:5)
at boundsError (node:internal/buffer:86:9)
at Buffer.readUInt32LE (node:internal/buffer:220:5)
at decodeValues_BYTE_ARRAY (/test/node_modules/parquetjs/lib/codec/plain.js:168:29)
at exports.decodeValues (/test/node_modules/parquetjs/lib/codec/plain.js:266:14)
at decodeValues (/test/node_modules/parquetjs/lib/reader.js:294:34)
at decodeDataPage (/test/node_modules/parquetjs/lib/reader.js:389:16)
at decodeDataPages (/test/node_modules/parquetjs/lib/reader.js:322:20)
at ParquetEnvelopeReader.readColumnChunk (/test/node_modules/parquetjs/lib/reader.js:255:12)
at async ParquetEnvelopeReader.readRowGroup (/test/node_modules/parquetjs/lib/reader.js:231:35) {
code: 'ERR_OUT_OF_RANGE'
}
Is there any solution for the above issue ? I encountered the same
happens to me whenever I try to read a file that has been saved with pandas
any workaround you did to overcome this ?
Yes and no, my workaround was:
yarn remove parquetjs
df.to_json(“./life/is/very-short.json”) # pandas to_json
Might not be what you anticipated.
Am 03.01.2024 um 08:24 schrieb Tanishq Saini @.***>:
any workaround you did to overcome this ?
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.