node-xml
node-xml copied to clipboard
Time to parse file is O(n^2) performance wrt to file size
I have an xml file containing a table. The parsing speed is very slow. I notice a pattern though:
110s for parsing the full file 25s when half the rows are removed Seems like textbook quadratic growth.
I also observe the per row processing time decrease as the parser advances through more rows; ie rows occurring later in the file take less time.
Just tested using fs.createReadStream to read the file in chunks; parsing the whole file now takes 10s (which is still bad for 450k in my opinion)
Adding this code:
var fc = this.m_xml.charAt(this.m_iP);
if (fc !== '<' && fc !== '&') {
return this._parseText (this.m_iP);
}
at the top of XMLP.prototype._parse cuts down the time by half. I have a 1MB file which used to take 4+ minutes and now takes 2m 12s. Problem is there's lots of repeated scanning using indexOf which results in the O(n^2) performance.
pcapr is right on with this issue. I added some profiling and this lib took 3-4 sec to parse a small (170KB) xml file with 99% of the work being indexOf calls. Give the n^2 growth, this will lead to a performance nightmare for medium to large xmls. This is can be improved significantly by passing the xml in chunks rather than a big string, but we've had so many issues already that it's easier just to drop the library.