sax-wasm icon indicating copy to clipboard operation
sax-wasm copied to clipboard

Extract values of result for further processing

Open L-U-C-K-Y opened this issue 1 year ago • 1 comments

Is your feature request related to a problem? Please describe. Hi @justinwilaby

Thanks for this great lib! I just tried it out and it is very fast.

We have a use-case where we need to go through large XML files 100MB+ and process elements within.

I am looking for a solution to stream through the XML and find the <Transaction></Transaction> elements, emit their content as an event to a queue and process them later with XPATH. By emitting smaller chunks of the XML, our RAM does not get maxed out when parsing.

I was able to use the SaxEventType.CloseTag to go process through the XML and find the start / end values, for example:

  "openStart": {
    "line": 337741,
    "character": 10
  },
  "openEnd": {
    "line": 337741,
    "character": 37
  },
  "closeStart": {
    "line": 338092,
    "character": 10
  },
  "closeEnd": {
    "line": 338092,
    "character": 26
  }

Do you think it's possible with your lib to extract the XML content on SaxEventType.CloseTag and emit them to a queue?

Thanks a lot

Describe the solution you'd like The possibility to extract the XML values when a tag is closed.

Describe alternatives you've considered I was thinking of getting the start/end value with your library and then stream again through the XML file and extract the values.

Additional context

Cheers!

L-U-C-K-Y avatar Sep 23 '22 13:09 L-U-C-K-Y

Hi @L-U-C-K-Y -

I do understand what you are asking and it's an interesting feature to consider.

It certainly can be done by making the pointer in the wasm memory available in each event. When the parser reports a close tag, the pointer can be used to slice the memory and queue it as bytes for processing later or pass it though a text decoder to use a human readable version.

Let me crunch on this for a bit. Thank you for the suggestion!

justinwilaby avatar Sep 23 '22 20:09 justinwilaby