saxpath icon indicating copy to clipboard operation
saxpath copied to clipboard

Could saxpath be duplex stream, on top of emitting 'match' events?

Open rprieto opened this issue 10 years ago • 9 comments

Maybe a basic question, but is there a reason saxpath is a writable stream only, and emits a match event? It would be nice if it was a Duplex stream, and we could also do:

var fileStream = fs.createReadStream('test/bookstore.xml');
var saxParser  = sax.createStream(true);
var streamer   = new saxpath.SaXPath(saxParser, '//book');

fileStream.pipe(saxParser).pipe(process.stdout);

Or of course, pipe into something more complex than stdout, like a JSON->XML transform stream.

rprieto avatar Jan 21 '15 23:01 rprieto

Thank you for your question and a great idea. The reason why it works this way was that it was simple enough but worked for the situation.

Feel free to adapt/extend saxpath and send a pull request. Maybe you can encapsulate the streamer in a stream-like-object. Unfortunately, I cannot do it myself at the moment due to other projects.

StevenLooman avatar Jan 22 '15 18:01 StevenLooman

Thanks. I have a working branch with the following API:

var fileStream = fs.createReadStream('test/bookstore.xml');
var streamer = new saxpath.Stream('//book', /* optional recorder */);
// print all matching XML chunks to stdout
fileStream.pipe(streamer).pipe(process.stdout);

It can technically be a replacement for new saxpath.SaXPath(...), since you just have to listen to the data event instead of match.

However this means adding a dependency to sax so that saxpath.Stream can create its own saxParser. Otherwise it would be quite cluncky because you wouldn't know what to pipe to (the saxParser you injected, or the saxpath.Stream itself?).

Is that OK, or would you rather keep injecting it? The saxParser events you bind to are quite specific, so saxpath.Stream creating its own might even prevent people from injecting random other parsers.

rprieto avatar Jan 23 '15 04:01 rprieto

Great idea. I had to write a kind of stream wrapper like this when I used saxpath. However, note that this can become a bit weird if you use a custom recorder that emits objects instead of strings (check the objectMode property of Readable and Writable streams).

PaulMougel avatar Jan 23 '15 10:01 PaulMougel

Sorry for not responding earlier. The example looks easy enough. Can you create a pull request or push your code to github? However, as you say, you'd be bound to the sax library. I'd like to keep the library small and mean, i.e., not adding too many dependencies or make the library dependent on one specific xml/sax library. But its not an absolute rule.

Be sure to keep the comment from @PaulMougel in mind. Maybe it should be checked/enforced in code, if that is possible.

Also, what happens if you stream in a document which will match elements inside elements? E.g.,

<library>
  <book>
    <title>Book 1</title
    <book>
        <title>Sub-book 1</title>
    </book>
  </book>
</library>

I'm not sure about the best solution here. Maybe you can create another library (or build it in your own project) which does specifically this. Also, I'd be happy to include it or link to it, so others can use it as well.

StevenLooman avatar Jan 26 '15 20:01 StevenLooman

No problem, thanks for your reply. I'll push my PR today. The idea was just to provide another external interface without changing any of the current behaviour. There's probably 2 parts to it:

  • exposing a readable stream

Simplifying a little, this is pretty much having a data event on top of the match event. As far as I understand, in your nested example, it would trigger just as many times as match would. The main advantage is it allows chaining streams, e.g.

new saxpath.Stream(...)
.pipe(xmlToJson())
.pipe(concat(function(array) {
  // array of JSON elements
})

This retains the main benefit of non-blocking processing, even if the data arrives small chunks at a time (e.g. network request).

Good point @PaulMougel about objectMode. What a lot of stream libraries do is accept extra args that get passed to the stream constructor. Would that be OK?

new saxpath.Stream(..., {objectMode: true})

I'll add some unit tests around this case.

  • exposing a writable stream

This one is not as important. I just thought if we have a readable stream already, it could be nice to pipe data straight into it too, to avoid the double-piping:

// double piping
fileStream.pipe(saxParser);
streamer.pipe(process.stdout);

// streamlined :)
fileStream.pipe(streamer).pipe(process.stdout)

It could also simplify usage for a lot of people who follow the README and end up installing sax because saxpath requires it - and maybe wouldn't think of trying another parser (what's the advantage? will it definitely work?). I understand though if you don't want to add the dependency, so it's 100% your call :smiley:

rprieto avatar Jan 26 '15 22:01 rprieto

Good point @PaulMougel about objectMode. What a lot of stream libraries do is accept extra args that get passed to the stream constructor. Would that be OK?

Sounds good to me! :+1:

PaulMougel avatar Jan 27 '15 06:01 PaulMougel

You are right, frantically not wanting to 'bind' to a specific sax library does not make any sense. The pull request looks mostly ok. Can you add/change the following things:

  • remove the mocha 1.12.0 dependency from package.json (https://github.com/TabDigital/saxpath/commit/b3b19048b080884093fb819b97472048d93a9f8e#diff-b9cfc7f2cdf78a7f4b91a753d10865a2R19)
  • make node engine 0.10 mandatory, upping from 0.8, in package.json
  • add an example of how to use you addition to the readme
  • create a contributors section at the bottom of the readme and add yourself and @PaulMougel to it

Then I can/will merge the pull request

StevenLooman avatar Jan 28 '15 21:01 StevenLooman

Also, the objectMode addition is ok.

StevenLooman avatar Jan 28 '15 21:01 StevenLooman

Thanks @StevenLooman, will do all the above. I added mocha because right now it assumes you have it installed globally (and I don't), but that's a matter of preference so I'll remove it.

For now, the PR actually injects the sax parser (new saxpath.Stream(saxParser, ...)), should I keep it like this or new-it-up inside the Stream?

rprieto avatar Jan 28 '15 22:01 rprieto