node-xml2js icon indicating copy to clipboard operation
node-xml2js copied to clipboard

Parsing async

Open fairhat opened this issue 9 years ago • 8 comments

Hey,

I've been using this module to convert xml files to json - split the object into several parts and save it to mongodb. However, i am using the latest version and async:true in the config, yet the whole parsing process seems to be blocking the thread completely.

Is the async config broken or am i doing something wrong?

fairhat avatar Jul 10 '15 12:07 fairhat

No, you're not missing anything. I thought that sax.js works in an async fashion, but turns out the underlying EventEmitter is blocking.

Leonidas-from-XIV avatar Oct 09 '15 17:10 Leonidas-from-XIV

Yea, i wrapped it in an Isolated Process in a nodejs cluster. I can share the code if anyone actually needs it.

fairhat avatar Oct 09 '15 17:10 fairhat

It is certainly one method, but the concept of spawning a process (hopefully, soon, a worker) to do this is something I would +1.

I'd be interested in benchmarks on large XML datasets and the building/parsing of them. See if it's even worth it.

Having a new process do this might not be worth it until workers are available (to cut down on IPC latency/mem usage.)

Edit: Spelling / Grammar.

tflanagan avatar Oct 12 '15 23:10 tflanagan

Well, if your xml-parsing function is tied to an api-server, it is kind of always worth it, no matter how "fast" it is. We were parsing 50MB+ XML files and the processing time was mostly 15-500+ seconds, blocking the whole api while working.

fairhat avatar Nov 11 '15 15:11 fairhat

500+ seconds?! Even with a new process to not block other api calls that would kill any front end usability.

Sounds like you should try to chunk those down to smaller XML docs.

tflanagan avatar Nov 11 '15 15:11 tflanagan

Well, we're parsing the xml, splitting the json into smaller objects, throwing the data to mongodb and after that, the app doesn't need to be updated unless the XML changes, which is like twice per month on average (depending on our client). The changes are small anyways, so they can live with the cached version until the update is finished. So the client actually just clicks on "update" and forget's about it until it is finished (and he gets a notification). Not saying it's the greatest way to deal with that, it's just that I don't have much of a choice changing the XML-Files. :-|

Edit: I should mention the XML docs are generated with some really old enterprise software.

fairhat avatar Nov 11 '15 17:11 fairhat

@fairhat not sure if this will solve your exact problem, but using native Promises with async/await might be the solution you need. See below:

let result = await new Promise((resolve, reject) =>
	parser.parseString(xml, (err, result) => {
		if (err) reject(err);
		else resolve(result);
	})
);

wmelton avatar Mar 28 '21 18:03 wmelton

@fairhat not sure if this will solve your exact problem, but using native Promises with async/await might be the solution you need. See below:


let result = await new Promise((resolve, reject) =>

	parser.parseString(xml, (err, result) => {

		if (err) reject(err);

		else resolve(result);

	})

);

Hello wmelton, thanks for your suggestion. The issue is 5 years old and i have changed my job since and the problem wasn't related to Promises/async/await syntax but that even when you're using promises, the operation was still blocking the main thread.

Not sure if the repo was updated since

fairhat avatar Mar 28 '21 18:03 fairhat