Eximia icon indicating copy to clipboard operation
Eximia copied to clipboard

Is there a way to consume XML elements lazily?

Open lowecg opened this issue 4 years ago • 4 comments

I've been using Eximia and have been very pleased with its performance and simplicity.

However, I'd like to use Eximia to operate on large documents in a memory-constrained environment (AWS Lambda)

The parsing seems to eagerly process all of the XML input which consumes a lot of memory and places a hard limit on the size of input that can be processed. For example, if I load a 29MiB input document, my Lambda reports a memory usage of 780MiB.

Would it be possible to have an option to consume the stream of XML tokens lazily, say via a lazy seq?

lowecg avatar Jun 05 '21 06:06 lowecg

The performance and (implementation) simplicity stems largely from not supporting laziness. Obviously lazy parsing is possible, but I am not confident I could do it with substantially less overhead than data.xml. Honestly I would just use data.xml if lazy parsing is a must. Unfortunately the libraries are not 100% compatible so I have to admit it would be easier to be able to just toggle an option.

nilern avatar Jun 07 '21 07:06 nilern

Thanks for looking at this.

I take your point regarding going fully lazy.

There might be a suitable balance between pure laziness and eagerness.

My use case, which I believe is quite a common use case, is to process a document that will have repeated child nodes under a parent:

<parent>
  <child />
  <child />
  <!-- ... lots of child nodes ... -->
  <child />
</parent>

If there's a way to specify a path that denotes the child node, then a sequence of eagerly processed child nodes might be enough to strike a balance between laziness and performance.

lowecg avatar Jun 08 '21 10:06 lowecg

I have a half-baked XML parser combinator library. That approach should enable your example and much more with even less memory usage.

But then I thought it is probably better to just make the 90% use case more efficient and released Eximia instead.

I am still thinking about memory reduction and parse-time transformations for both XML and JSON. There doesn't seem to be a whole lot of demand but maybe people just don't know what they are missing :shrug:

nilern avatar Jun 08 '21 12:06 nilern

Great, I'll have a look at Esco.

And thank you again for looking into this - it's very much appreciated.

lowecg avatar Jun 08 '21 12:06 lowecg