node-csvtojson
node-csvtojson copied to clipboard
failure in garbage collection, during rowSplit
I use your library to prep data for a covid19 charting tool.
the source file continues to grow every day. to ensure I minimize space usage, I use the row filter(.subscribe) to only accept rows that match the selected locale and date range.
csvtojson()
.fromFile(payload.filename) // input csv
.subscribe((jsonObj, index) => {
try {
// if this field is for one of the locations requested
if (
self.fieldtest[payload.config.type](jsonObj, payload)
) {
// if this location is within the date range
if (
moment(
jsonObj[payload.fields.date_fieldname],
self.date_mask[payload.config.type]
).isSameOrAfter(start)
) {
//console.log("saving data for location ="+ jsonObj[payload.fields.location_fieldname])
payload.location[
jsonObj[payload.fields.location_fieldname]
].push(jsonObj);
}
}
} catch (error) {
console.log(" location undefined =" + error);
}
})
.then((result) => {
// all done, tell the topmost function we completed
if (payload.config.debug)
console.log("done processing file id=" + payload.id);
// get the 1st promise resolver (if any), and send the data back
if(payload.resolve.length){
payload.resolve.shift()({
data: payload.location,
payload: payload,
});
}
},(error)=>{
console.log("error on cvt file = "+ payload.filename)
});
its possible that multiple charters could request data concurrently, and I use a promise mechanism to allow only one to go fetch the data (13 meg)
-rw-r--r-- 1 pi pi 13668624 Dec 22 08:06 countries-rawdata-12-22-2020.csv
and the others wait... but.. as soon as the data arrives, and is written to disk,(fs write async callback), I resolve all the promises for the waiting tasks and let them get access to the data file...(and overlap with each other where any async services allow)
at this point its possible for more than one invocation of csvtojson to be active in the same 'process' space.
i have a user that has two instances, on a pi3, which I can reproduce..
we get a consistent garage collection trap , almost always in rowSplit.
<--- Last few GCs --->
[12157:0x5c65000] 29043 ms: Mark-sweep 230.9 (231.9) -> 230.6 (231.9) MB, 671.7 / 0.1 ms (+ 0.1 ms in 12 steps since start of marking, biggest step 0.0 ms, walltime since start of marking 693 ms) (average mu = 0.097, current mu = 0.048) allocation fai
<--- JS stacktrace --->
==== JS stack trace =========================================
0: ExitFrame [pc: 0x2265d24]
Security context: 0x4e68cead <JSObject>
1: split [0x4e6841cd](this=0x49d89229 <String[154]: SSD,Africa,South Sudan,2020-04-06,1.0,0.0,,,,,0.089,0.0,,,,,,,,,,,,,,,,,,,,,,,,,73.15,11193729.0,,19.2,3.441,2.032,1569.888,,280.775,10.43,,,,,57.85,0.388>,0x381044f1 <String[#1]: ,>)
2: parseMultiLines [0x3f772ed1] [/home/pi/MagicMirror/modules/MyCovid19/node_modules/csvtojson/v2/rowSplit.js:~203] [pc=0x427...
I cannot recreate it on old intel I7 2700k, 32 gig a ram, running ubuntu. with 10 charting tasks
i can increase the node process space NODE_OPTIONS="--max_old_space_size=2048"
but this doesn't change the results.. so this 'sounds' like a global heap variable being overwritten unexpectedly.
debugging guidance welcomed
this code has been running fine since late march... the user and I have both started with fresh system installs, and installed all new code, node 10.23.0 and npm 6.14.8 pi memory
free -m
total used free shared buff/cache available
Mem: 925 210 398 47 317 609
Swap: 2047 7 2040
as a workaround, I changed my app to only allow one invocation of csvtojson at a time.. no more heap crash.