Blank node skolemization
HDT files with blank nodes generate invalid turtle because of blank nodes. Can we create a skolemized Model somehow?
On the one hand, this seems to be a problem with the HDT library not correctly converting blank nodes to the corresponding Jena representation. On the other hand, the TPF spec clearly says that components must not be blank nodes, so we should indeed skolemize them in any case, like the JavaScript implementation does.
This might help, although reprocessing all nodes doesn't seem very performant.
We should be able to do the same as in the JavaScript code by just changing this function (probably on the base class level even).
Right, that would be fairly easy, but it would be datasource specific though... Another option would be to create a decorator for https://jena.apache.org/documentation/javadoc/arq/org/apache/jena/riot/WriterGraphRIOT.html
Well, in the JavaScript version, it's implemented on the base class, so not source-specific. I still think this is possible. The additional complexity here is that dictionary.getNode seems to have a bug so that it does not return blank nodes but IRIs that start with _:. The generic solution would be to work around that, and then everything works with a generic base method (or RIOT decorator). But then we have some performance loss, because it would first (incorrectly) convert to IRI, then to blank, then to IRI again. So it might be best to have a one-off solution here for performance.
So we'll have to improve the java HDT code no matter what, which gives us the opportunity to move to Jena 3
Hi all, I just stumbled over this and it still seems to be an issue. I'm working on some other things in Server.java (including support for quad formats) so if you can give me any hints on how to fix this issue, I can give it a try. A related note: The TPF specification says that bNodes SHOULD be skolemized, not that it is mandatory. Does anyone here know if e. g. comunica requires bNodes to be skolemized? And for TPF to work with bNodes, a TPF server MUST have bNode identifiers that are consistent over consecutive requests, I don't think that's explicit in the spec. Thanks, Lars
As per https://github.com/comunica/comunica/issues/375 the spec now says that data triples MUST NOT contain blank nodes and that the RECOMMENDED way of removing them is skolemization.
Is there such a thing as a conformance test suite for TPF servers?
Not yet unfortunately, but that would indeed be very nice to have.
Sounds like fun. You can assign it to me, I'll try it as friday afternoon thing