virtuoso-opensource Slow performance on `GRAPH

I created this graph group:

DB.DBA.RDF_GRAPH_GROUP_DROP('http://www.batch0.fr/', 0);
DB.DBA.RDF_GRAPH_GROUP_CREATE('http://www.batch0.fr/',0);
DB.DBA.RDF_GRAPH_USER_PERMS_SET ('http://www.batch0.fr/', 'nobody', 9);
DB.DBA.RDF_GRAPH_GROUP_INS('http://www.batch0.fr/', 'http://www.vendor0.fr/');
DB.DBA.RDF_GRAPH_GROUP_INS('http://www.batch0.fr/', 'http://www.vendor1.fr/');
DB.DBA.RDF_GRAPH_GROUP_INS('http://www.batch0.fr/', 'http://www.vendor2.fr/');
DB.DBA.RDF_GRAPH_GROUP_INS('http://www.batch0.fr/', 'http://www.vendor3.fr/');
DB.DBA.RDF_GRAPH_GROUP_INS('http://www.batch0.fr/', 'http://www.vendor4.fr/');

I executed this query, it takes forever when it should be instantaneous:

SELECT COUNT(*)
FROM <http://www.batch0.fr/>
WHERE {
    ?s ?p ?o 
}

Apr 25 '24 14:04 mhoangvslev

What is the Virtuoso version you are using as this works for me querying from the SPARQL endpoint or isql, with the latest develop/7 build ?

Apr 25 '24 14:04 HughWilliams

I use v7.2.12 The output of virtuoso-tis:

Version 7.2.12.3239-pthreads as of Feb 13 2024 (d698f21712)
Compiled for Linux (x86_64-alpine-linux-gnu)
Copyright (C) 1998-2024 OpenLink Software

Apr 25 '24 14:04 mhoangvslev

I also include in the link below the dump of the database I use (virtuoso.db + virtuoso.ini): https://drive.google.com/file/d/1lAlzAkr6Vy3BZZGjf59padrTXaffDoNj/view?usp=sharing

Apr 25 '24 15:04 mhoangvslev

In your test case, you only had 4 graphs in the graph group, with no data inserted in any of the graphs. Whereas in the database provided, there are 20 graphs in the graph group, with a total of 3M+ triples across all the graphs.

Graph groups does not scale in Virtuoso Open Source, as the query across the graph group gets compiled as SELECT ... G IN () resulting in multiple join condition tests, which is a very time consuming operation to perform serially on every row, and so will not scale. The Virtuoso 8.x Commercial Edition implements a new invisible hash join algorithm, which would compile such queries as a hash IN join that runs in parallel, and is thus more performant and scalable.

Apr 26 '24 15:04 HughWilliams

Thank you for your insight!

The workaround is to ingest graph data of the same group into separate Virtuoso databases and execute the queries accordingly. Will the implementation be ported to Virtuoso Open Source at some point?

Apr 29 '24 09:04 mhoangvslev

There are no plans for the invisible hash join feature to be ported to open source, it is a commercial only feature.

Apr 29 '24 15:04 HughWilliams

Slow performance on `GRAPH_GROUP`