node-xml2js
node-xml2js copied to clipboard
Parse a section at a time - very large file
Hi,
Sorry if I'm asking this question in the wrong place:
I have a very large XML file - 10GB - and I'm using the following code to parse it:
function parsexml(callback) {
fs.readFile(__dirname + '/sample_larger.xml', function (err, data) {
parser.parseString(data, function (err, result) {
pData = result
console.log('====> Done Parsing XML');
callback(null)
})
})
}
Then I have second function addVertices
which uses pData - to make a Gremlin query to add that data into my Azure Cosmos DB.
My question is - copying the 10GB into that variable pData - seems like a bit of a waste - is it possible instead to parse one section at a time - for example if I specified the XML header I'm after.
Assume my xml looks something like:
<songs>
<song>
//details I want
</song>
</songs>
Is there something like:
function parsexml(callback) {
fs.readFile(__dirname + '/sample_larger.xml', function (err, data) {
parser.parseSection("song", function (err, result) {
//do my gremlin query into my Cosmos DB
callback(null)
})
})
}
Any advice/help appreciated.
Thanks
copying the 10GB into that variable pData - seems like a bit of a waste
Remember, when you do pData = result, you're not copying anything - pData then holds a reference to the same object as result. both of them are pointing to the same object
Right - so nothing really to worry about.
Nevertheless, is it possible to parse a block of XML at a time - based on a tag I specify?
@psmod2 I had the same problem. I've made a function that does this and follows the xml2js format (it's pretty slow, but doesn't run out of memory):
https://github.com/tfso/njs-tfso-xml/blob/master/src/streamParse.js Usage: https://github.com/tfso/njs-tfso-xml/blob/master/test/testStreamParse.js
It returns in the same format as the xml2js version here:
https://github.com/tfso/njs-tfso-xml/blob/master/src/parse.js
Would be awesome to get this functionality in xml2js as well.
Also see https://github.com/Leonidas-from-XIV/node-xml2js/issues/137 and https://github.com/Leonidas-from-XIV/node-xml2js/issues/102
@psmod2 any workaround?