jena SPARQL-star queries perform poorly

Version

4.3.2

What happened?

I evaluated JenaTDB2 4.3.2 with a SPARQL-star dataset with 9.411.041 triples (4.9 GB). I loaded the dataset with the tdb2 loader as on-disk storage. Then I tried a set of rather complex rdf-star queries only to find that none of them was able to finish. I went on to pin down the issue and tried a very simple SPARQL-star query that contains only one nested quoted triple statement:

select * { <<<<?s ?p ?o>> ?a ?b >> ?x ?y. }

However, even this query took about 10min (639 sec)

15:47:30 INFO  Server          :: Apache Jena Fuseki 4.3.2
15:47:30 INFO  Config          :: FUSEKI_HOME=/opt/apache_jena_fuseki/apache-jena-fuseki-4.3.2
15:47:30 INFO  Config          :: FUSEKI_BASE=/home/fkovacev/.jena/fuseki-4.3.2
15:47:30 INFO  Config          :: Shiro file: file:///home/fkovacev/.jena/fuseki-4.3.2/shiro.ini
15:47:31 INFO  Config          :: Load configuration: file:///home/fkovacev/.jena/fuseki-4.3.2/configuration/bearc_tb_sr_rs.ttl
15:47:31 INFO  Server          :: Configuration file: /home/fkovacev/.jena/fuseki-4.3.2/config.ttl
15:47:31 INFO  Server          :: Path = /bearc_tb_sr_rs
15:47:31 INFO  Server          :: System
15:47:31 INFO  Server          ::   Memory: 4.0 GiB
15:47:31 INFO  Server          ::   Java:   17.0.5
15:47:31 INFO  Server          ::   OS:     Linux 5.15.0-58-generic amd64
15:47:31 INFO  Server          ::   PID:    106127
15:47:31 INFO  Server          :: Started 2023/02/03 15:47:31 CET on port 3030
15:47:52 INFO  Fuseki          :: [5] POST http://localhost:3030/bearc_tb_sr_rs/sparql
15:47:52 INFO  Fuseki          :: [5] Query = select * { <<<<?s ?p ?o>> ?a ?b >> ?x ?y. } 
15:58:31 INFO  Fuseki          :: [5] 200 OK (638.810 s)

The memory didn't seem to be the problem.

I tried the same set of queries on GraphDB and they all needed only a few seconds. Is it possible that Jena generally performs poorly with SPARQL-star and even worse if there are multiple nesting levels?

Relevant output and stacktrace

No response

Are you interested in making a pull request?

None

Feb 03 '23 15:02 GreenfishK

Yes, it is possible.

The Jena current support for RDF-star has not made any changes to the on-disk datastructures except for adding the new RDF term type. This enables people to try RDF-star without disrupting their other databases or needing multiple versions of the code on their systems.

The RDF-star Working Group has started. I'd appreciate understanding what is the use case for nested quoted triples?

Feb 05 '23 13:02 afs

The use case is timestamped-based versioning of RDF datasets using RDF-star and SPARQL-star. As part of my research, I made an API that lets you update RDF triples and issue SPARQL queries by automatically transforming them into RDF-star triples and SPARQL-star queries with timestamps attached. The RDF-star triples look like this:

<< << <http://example.com/s> <http://example.com/p> "o">> :valid_from "2023-02-06T12:00:00""^^xsd:datetime >> :valid_until "9999-12-31T12:00:00"^^xsd:datetime  .

Using two nesting levels, I can attach a creation and deletion timestamp.

I also tried the more intuitive and semantically correct approach:

<< <http://example.com/s> <http://example.com/p> "o">> :valid_from "2023-02-06T12:00:00""^^xsd:datetime .
<< <http://example.com/s> <http://example.com/p> "o">> :valid_until "9999-12-31T12:00:00"^^xsd:datetime  .

However, with this approach the datasets are bigger due to the redundancy (repetition of the data triple) and the query performance is worse in GraphDB 9.3 and JenaTDB 4.3.2. So I decided to go with the nested quoted triple.

More infos: API: https://github.com/GreenfishK/starvers Evaluation of the RDF-star and timestamp-based versioning approach: https://github.com/GreenfishK/starvers_eval (still ongoing) Paper submitted to SWJ: http://semantic-web-journal.org/content/starvers-versioning-and-timestamping-rdf-data-means-rdf-approach-based-annotated-triples (major revision going to be submitted soon)

Feb 06 '23 07:02 GreenfishK