N3.js icon indicating copy to clipboard operation
N3.js copied to clipboard

N-Quads are written if N-Triples are requested and graph is supplied

Open thombohlk opened this issue 6 years ago • 21 comments

When using a Writer with format N-Triples and adding quads that include a graph, the writer will output in N-Quads format.

Code (using N3.js version 1.0.4):

var N3 = require('../N3');
const writer = new N3.Writer({ format:'N-Triples' });
writer.addQuad(
  N3.DataFactory.namedNode('http://example.org/cartoons#Tom'),
  N3.DataFactory.namedNode('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'),
  N3.DataFactory.namedNode('http://example.org/cartoons#Cat'),
  N3.DataFactory.namedNode('http://example.org/cartoons#Graph')
);

writer.end((error, result) => console.log(result));

Output:

<http://example.org/cartoons#Tom> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://example.org/cartoons#Cat> <http://example.org/cartoons#Graph> .

I would expect the writer to ignore the graph when I ask for the N-Triples format.

thombohlk avatar Mar 05 '19 12:03 thombohlk

That is currently indeed intended behavior; in the sense that the write assumes you are sending the right input for the format.

Not sure what we should do in this case. Silently ignoring the graph might not be best, dropping it isn't great either; just leaves an error in my opinion.

What do you think?

RubenVerborgh avatar Mar 05 '19 12:03 RubenVerborgh

I see your point, didn't look at it from that perspective.

Our current use case is one where we want to export a dataset (which can or cannot contain a graph field) in a certain format. From this point of view it would be weird if we would have to adapt the data to the format we want. I would say N3.js should contain the logic which fields are required for which format.

In my opinion the package should output the format that is requested and only that format. It should check whether or not all data for the requested format is available, and throw an error if it is not.

thombohlk avatar Mar 05 '19 13:03 thombohlk

Our current use case is one where we want to export a dataset (which can or cannot contain a graph field) in a certain format. From this point of view it would be weird if we would have to adapt the data to the format we want.

That makes sense on the one hand (as N3.js controls the format), but not on the other hand (given that only the source app knows how to deal with the graph).

It should check whether or not all data for the requested format is available, and throw an error if it is not.

Small issue there is that it is streaming, so we cannot check "all" the data, or at least not in advance. We could send an error the moment we encounter invalid data.

RubenVerborgh avatar Mar 05 '19 13:03 RubenVerborgh

@thombohlk maybe you want to try rdf-transform-triple-to-quad. You can feed it also with factory.defaultGraph() to create triples with that transformation.

bergos avatar Mar 05 '19 14:03 bergos

Arguably, if the caller specifically asked for N-Triples, the library should really only output valid N-Triples.

— So perhaps the writer should be able to enforce that?

Just my 2p!

jimsmart avatar Mar 05 '19 14:03 jimsmart

That is a possible argument; the other being that valid input should be sent for valid output. N3.js is a very low-level library; part of its speed comes from assuming valid input. My original point of view on this issue is simply that the behavior of sending quads to a triple writer is undefined. (Especially given that defining it correctly and universally is an impossibility.)

RubenVerborgh avatar Mar 05 '19 14:03 RubenVerborgh

Sure, understood. I can see both sides.

— Maybe it just needs more/improved documentation? (Certainly no perf impact with this option! 😉)

jimsmart avatar Mar 05 '19 14:03 jimsmart

Open to that! Let’s give people some more time to discuss, and then we can pick.

RubenVerborgh avatar Mar 05 '19 16:03 RubenVerborgh

Another possibility would be: when the caller asks specifically for N-Triples, swap out the existing code that writes quads, for a function that only writes triples (completely ignoring the graph).

This would not only solve the problem at hand, but wouldn’t affect the performance of writing N-Quads in any noticeable way (adding only an ‘if’ test at construction time) - and it would, theoretically at least, actually improve the performance of N-Triple writing! 👍 (because there would be one less ‘if’ statement in the core looped method that writes the quads/triples)

Edit: I say all that, but it’s possible that the extra indirection may have some negative impact — JavaScript virtual machines can be a little cranky to predict performance on, without measuring.

jimsmart avatar Mar 06 '19 12:03 jimsmart

(completely ignoring the graph).

This would not only solve the problem at hand

It wouldn't; who says it is correct to place triples in non-default graphs in the default graph? That is highly context-dependent. (For the same reason, dropping them is not the perfect solution either.)

RubenVerborgh avatar Mar 06 '19 12:03 RubenVerborgh

Ok, well that’s your call. I was just offering a solution that I hadn’t seen being discussed yet.

jimsmart avatar Mar 06 '19 12:03 jimsmart

Definitely, proposals welcome (I didn't intend to sound negative, apologies if I did). I like the bit about the performance (we do a similar swapping for line mode versus Turtle/N-Quads), but I'm hesitant to make decisions on behalf of the caller.

RubenVerborgh avatar Mar 06 '19 12:03 RubenVerborgh

…it could be configurable though, as in { handleQuads: 'ignore' / 'drop' / 'error' }

RubenVerborgh avatar Mar 06 '19 12:03 RubenVerborgh

That’s a good idea!

jimsmart avatar Mar 06 '19 12:03 jimsmart

That sounds like a good option! Another solution would be to split it in { strict: true / false, handleQuads: 'drop' / 'error' }. strict would indicate if you want to follow the W3C standards given by format, handleQuads would indicate how to handle inconsistencies between the data and the W3C standard if strict == true.

Both solutions work for me :+1:

thombohlk avatar Mar 06 '19 14:03 thombohlk

Just my 2c:

  • I wouldnt mind just removing the graph without warning, error or notice. To me, the fact that the callee specifies she'd like n-triples, means that that person is not only ok with dropping the graph names, but has the intention of doing that as well
  • Regardless of my first point, I'm in favor of any solution where I dont have to go through the whole store, array or stream again to remove the graphname myself ;)

LaurensRietveld avatar Apr 05 '19 12:04 LaurensRietveld

Open to that! Let’s give people some more time to discuss, and then we can pick.

So how about this? Was a conclusion reached?

Just to add to a years-old discussion: I would say that specifying n-triples is explicit enough to warrant dropping the graph if it happens to be present.

Making it configurable is also fine, but sounds like more work to implement.

MPvHarmelen avatar Jan 28 '22 14:01 MPvHarmelen

Just an update from our side. Not being able to serialize to ntriples without manipulating the quads is becoming a performance hit for us. (having to recreate all quads, removing the graphname) Are you open for a PR @RubenVerborgh ? I have no preference for either of the solutions above, so let me know which one you prefer

LaurensRietveld avatar Jan 23 '23 08:01 LaurensRietveld

PR is definitely fine, thanks 👍

RubenVerborgh avatar Jan 24 '23 11:01 RubenVerborgh

@RubenVerborgh Going for this:

 { handleQuads:  'keep' / 'drop' / 'error' }

Keep: writes the graph names Drop: omits the graph names Error: throws an error when encountering a graph name

Keep will be the default.

Does this sound good?

LaurensRietveld avatar Jan 24 '23 11:01 LaurensRietveld

Could we make it { graphs: 'keep' | 'ignore' | 'error' }, with a default of keep indeed. Default will be changed to error in next vmajor. Thanks a bunch.

RubenVerborgh avatar Jan 24 '23 12:01 RubenVerborgh