tXml icon indicating copy to clipboard operation
tXml copied to clipboard

transformStream doesn't seem to be working correctly

Open evoactivity opened this issue 1 year ago • 2 comments

Hello 👋

I went looking for a fast xml library for a particular use case I have and boy did I ever find one! Amazing work Tobias! I can use streams with it too?! Superb!

Unfortunately, using the code from the readme doesn't seem to work for me :(. I'm simply told xmlStream is not async iterable I'm on node v18.11.0.

const xmlStream = fs
    .createReadStream("./my-file.xml")
    .pipe(txml.transformStream());

  for await (let element of xmlStream) {
    // your logic here ...
  }

I tried attaching event handles like

const xmlStream = fs
    .createReadStream("./my-file.xml")
    .pipe(txml.transformStream());

xmlStream.on('data', (data) => {console.log(data});

And that didn't work, it just hangs, but if I remove the pipe I see the chunks coming in from the readStream

const xmlStream = fs
    .createReadStream("./my-file.xml");

xmlStream.on('data', (data) => {console.log(data.toString())});

Not sure what else I can try from my side.

evoactivity avatar Apr 01 '23 01:04 evoactivity

So as it turns out attaching event handlers does work but the data event doesn't seem to fire for large files. For small files that can be parsed in 1 chunk, it seems to work.

I have 3 files I'm using to test, 3kb, 31mb and 129mb.

They all parse in < 1 second using txml.parse(), but > 1 minute to stream for the 31mb file, and I gave up waiting for the 129mb file.

I setup a nanobench benchmark

// ..imports, loading file with readFileSync etc

bench("read stream", (b) => {
  b.start();

  const xmlStream = fs.createReadStream("my-file.xml");

  xmlStream
    .on("data", (data) => {
      process.stdout.write(".");
    })
    .on("end", () => {
      process.stdout.write("\n");
      b.end();
    });
});

bench("txml", function (b) {
  b.start();
  txml.parse(epg);
  b.end();
});

bench("txml-streaming", (b) => {
  b.start();

  const xmlStream = fs
    .createReadStream("./src/my-file.xml")
    .pipe(txml.transformStream(0));

  xmlStream
    .on("data", (data) => {
      console.log("data");
    })
    .on("end", () => {
      b.end();
      console.log("end");
    });
});

This is the output

3kb file

NANOBENCH version 2
> /Users/liam/.volta/tools/image/node/18.11.0/bin/node test3.js

# read stream
.
ok ~1.17 ms (0 s + 1165250 ns)

# txml
ok ~593 μs (0 s + 593459 ns)

# txml-streaming
data
ok ~2.24 ms (0 s + 2242958 ns)

end
all benchmarks completed
ok ~4 ms (0 s + 4001667 ns)

31mb file data doesn't fire so no "data" logged

NANOBENCH version 2
> /Users/liam/.volta/tools/image/node/18.11.0/bin/node test3.js

# read stream
...................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
ok ~47 ms (0 s + 46826708 ns)

# txml
ok ~285 ms (0 s + 285426250 ns)

# txml-streaming
ok ~1.37 min (82 s + 728399833 ns)

end
all benchmarks completed
ok ~1.38 min (83 s + 60652791 ns)

129mb I gave up waiting for it to fire the end event

NANOBENCH version 2
> /Users/liam/.volta/tools/image/node/18.11.0/bin/node test3.js

# read stream
...................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
ok ~167 ms (0 s + 167206792 ns)

# txml
ok ~975 ms (0 s + 974563166 ns)

# txml-streaming
^C⏎                                                                                                                                                                            

took 13m13s

evoactivity avatar Apr 01 '23 15:04 evoactivity

I think you need to set the offset. usually XML files start with some meta data or at least with some root element. the offset is to put the cursor at the beginning directly behind the root element where the actual data begins. I see you have set the offset to 0. then the parser don't know where an element ends and you get all at once, without the advantage of a stream. I think for the large files you just get out of memory?

TobiasNickel avatar May 13 '23 00:05 TobiasNickel