Server.Java icon indicating copy to clipboard operation
Server.Java copied to clipboard

Blank node skolemization

Open mielvds opened this issue 10 years ago • 11 comments

HDT files with blank nodes generate invalid turtle because of blank nodes. Can we create a skolemized Model somehow?

mielvds avatar Dec 02 '15 09:12 mielvds

On the one hand, this seems to be a problem with the HDT library not correctly converting blank nodes to the corresponding Jena representation. On the other hand, the TPF spec clearly says that components must not be blank nodes, so we should indeed skolemize them in any case, like the JavaScript implementation does.

RubenVerborgh avatar Dec 02 '15 09:12 RubenVerborgh

This might help, although reprocessing all nodes doesn't seem very performant.

mielvds avatar Dec 02 '15 09:12 mielvds

We should be able to do the same as in the JavaScript code by just changing this function (probably on the base class level even).

RubenVerborgh avatar Dec 02 '15 09:12 RubenVerborgh

Right, that would be fairly easy, but it would be datasource specific though... Another option would be to create a decorator for https://jena.apache.org/documentation/javadoc/arq/org/apache/jena/riot/WriterGraphRIOT.html

mielvds avatar Dec 02 '15 09:12 mielvds

Well, in the JavaScript version, it's implemented on the base class, so not source-specific. I still think this is possible. The additional complexity here is that dictionary.getNode seems to have a bug so that it does not return blank nodes but IRIs that start with _:. The generic solution would be to work around that, and then everything works with a generic base method (or RIOT decorator). But then we have some performance loss, because it would first (incorrectly) convert to IRI, then to blank, then to IRI again. So it might be best to have a one-off solution here for performance.

RubenVerborgh avatar Dec 02 '15 09:12 RubenVerborgh

So we'll have to improve the java HDT code no matter what, which gives us the opportunity to move to Jena 3

mielvds avatar Dec 02 '15 09:12 mielvds

Hi all, I just stumbled over this and it still seems to be an issue. I'm working on some other things in Server.java (including support for quad formats) so if you can give me any hints on how to fix this issue, I can give it a try. A related note: The TPF specification says that bNodes SHOULD be skolemized, not that it is mandatory. Does anyone here know if e. g. comunica requires bNodes to be skolemized? And for TPF to work with bNodes, a TPF server MUST have bNode identifiers that are consistent over consecutive requests, I don't think that's explicit in the spec. Thanks, Lars

larsgsvensson avatar Jan 10 '19 17:01 larsgsvensson

As per https://github.com/comunica/comunica/issues/375 the spec now says that data triples MUST NOT contain blank nodes and that the RECOMMENDED way of removing them is skolemization.

larsgsvensson avatar Jan 18 '19 08:01 larsgsvensson

Is there such a thing as a conformance test suite for TPF servers?

larsgsvensson avatar Jan 18 '19 09:01 larsgsvensson

Not yet unfortunately, but that would indeed be very nice to have.

RubenVerborgh avatar Jan 18 '19 09:01 RubenVerborgh

Sounds like fun. You can assign it to me, I'll try it as friday afternoon thing

mielvds avatar Jan 18 '19 10:01 mielvds