node-csvtojson failure in garbage collection, during rowSplit

failure in garbage collection, during rowSplit

Open sdetweil opened this issue 3 years ago • 1 comments

I use your library to prep data for a covid19 charting tool.

the source file continues to grow every day. to ensure I minimize space usage, I use the row filter(.subscribe) to only accept rows that match the selected locale and date range.

csvtojson()
			.fromFile(payload.filename) // input csv
			.subscribe((jsonObj, index) => {
				try {
					// if this field is for one of the locations requested
					if (
						self.fieldtest[payload.config.type](jsonObj, payload)
					) {
						// if this location is within the date range
						if (
							moment(
								jsonObj[payload.fields.date_fieldname],
								self.date_mask[payload.config.type]
							).isSameOrAfter(start)
						) {
							//console.log("saving data for location ="+ jsonObj[payload.fields.location_fieldname])
							payload.location[
								jsonObj[payload.fields.location_fieldname]
							].push(jsonObj);
						}
					}
				} catch (error) {
					console.log(" location undefined =" + error);
				}
			})
			.then((result) => {
				// all done, tell the topmost function we completed
				if (payload.config.debug)
					console.log("done processing file id=" + payload.id);
				// get the 1st promise resolver (if any), and send the data back
				if(payload.resolve.length){
					payload.resolve.shift()({
						data: payload.location,
						payload: payload,
					});
				}
			},(error)=>{
				console.log("error on cvt file = "+ payload.filename)
			});

its possible that multiple charters could request data concurrently, and I use a promise mechanism to allow only one to go fetch the data (13 meg)

-rw-r--r--  1 pi pi 13668624 Dec 22 08:06 countries-rawdata-12-22-2020.csv

and the others wait... but.. as soon as the data arrives, and is written to disk,(fs write async callback), I resolve all the promises for the waiting tasks and let them get access to the data file...(and overlap with each other where any async services allow)

at this point its possible for more than one invocation of csvtojson to be active in the same 'process' space.

i have a user that has two instances, on a pi3, which I can reproduce..

we get a consistent garage collection trap , almost always in rowSplit.

<--- Last few GCs --->

[12157:0x5c65000]    29043 ms: Mark-sweep 230.9 (231.9) -> 230.6 (231.9) MB, 671.7 / 0.1 ms  (+ 0.1 ms in 12 steps since start of marking, biggest step 0.0 ms, walltime since start of marking 693 ms) (average mu = 0.097, current mu = 0.048) allocation fai

<--- JS stacktrace --->

==== JS stack trace =========================================

    0: ExitFrame [pc: 0x2265d24]
Security context: 0x4e68cead <JSObject>
    1: split [0x4e6841cd](this=0x49d89229 <String[154]: SSD,Africa,South Sudan,2020-04-06,1.0,0.0,,,,,0.089,0.0,,,,,,,,,,,,,,,,,,,,,,,,,73.15,11193729.0,,19.2,3.441,2.032,1569.888,,280.775,10.43,,,,,57.85,0.388>,0x381044f1 <String[#1]: ,>)
    2: parseMultiLines [0x3f772ed1] [/home/pi/MagicMirror/modules/MyCovid19/node_modules/csvtojson/v2/rowSplit.js:~203] [pc=0x427...

I cannot recreate it on old intel I7 2700k, 32 gig a ram, running ubuntu. with 10 charting tasks

i can increase the node process space NODE_OPTIONS="--max_old_space_size=2048"

but this doesn't change the results.. so this 'sounds' like a global heap variable being overwritten unexpectedly.

debugging guidance welcomed

this code has been running fine since late march... the user and I have both started with fresh system installs, and installed all new code, node 10.23.0 and npm 6.14.8 pi memory

free -m
              total        used        free      shared  buff/cache   available
Mem:            925         210         398          47         317         609
Swap:          2047           7        2040

Dec 22 '20 15:12 sdetweil

as a workaround, I changed my app to only allow one invocation of csvtojson at a time.. no more heap crash.

Dec 23 '20 17:12 sdetweil

node-csvtojson node-csvtojson copied to clipboard

failure in garbage collection, during rowSplit

node-csvtojson
node-csvtojson copied to clipboard