virtuoso-opensource icon indicating copy to clipboard operation
virtuoso-opensource copied to clipboard

Slow performance on `GRAPH_GROUP`

Open mhoangvslev opened this issue 1 year ago • 6 comments

  • I created this graph group:

    DB.DBA.RDF_GRAPH_GROUP_DROP('http://www.batch0.fr/', 0);
    DB.DBA.RDF_GRAPH_GROUP_CREATE('http://www.batch0.fr/',0);
    DB.DBA.RDF_GRAPH_USER_PERMS_SET ('http://www.batch0.fr/', 'nobody', 9);
    DB.DBA.RDF_GRAPH_GROUP_INS('http://www.batch0.fr/', 'http://www.vendor0.fr/');
    DB.DBA.RDF_GRAPH_GROUP_INS('http://www.batch0.fr/', 'http://www.vendor1.fr/');
    DB.DBA.RDF_GRAPH_GROUP_INS('http://www.batch0.fr/', 'http://www.vendor2.fr/');
    DB.DBA.RDF_GRAPH_GROUP_INS('http://www.batch0.fr/', 'http://www.vendor3.fr/');
    DB.DBA.RDF_GRAPH_GROUP_INS('http://www.batch0.fr/', 'http://www.vendor4.fr/');
    
  • I executed this query, it takes forever when it should be instantaneous:

    SELECT COUNT(*)
    FROM <http://www.batch0.fr/>
    WHERE {
        ?s ?p ?o 
    } 
    

mhoangvslev avatar Apr 25 '24 14:04 mhoangvslev

What is the Virtuoso version you are using as this works for me querying from the SPARQL endpoint or isql, with the latest develop/7 build ?

HughWilliams avatar Apr 25 '24 14:04 HughWilliams

I use v7.2.12 The output of virtuoso-tis:

Version 7.2.12.3239-pthreads as of Feb 13 2024 (d698f21712)
Compiled for Linux (x86_64-alpine-linux-gnu)
Copyright (C) 1998-2024 OpenLink Software

mhoangvslev avatar Apr 25 '24 14:04 mhoangvslev

I also include in the link below the dump of the database I use (virtuoso.db + virtuoso.ini): https://drive.google.com/file/d/1lAlzAkr6Vy3BZZGjf59padrTXaffDoNj/view?usp=sharing

mhoangvslev avatar Apr 25 '24 15:04 mhoangvslev

In your test case, you only had 4 graphs in the graph group, with no data inserted in any of the graphs. Whereas in the database provided, there are 20 graphs in the graph group, with a total of 3M+ triples across all the graphs.

Graph groups does not scale in Virtuoso Open Source, as the query across the graph group gets compiled as SELECT ... G IN () resulting in multiple join condition tests, which is a very time consuming operation to perform serially on every row, and so will not scale. The Virtuoso 8.x Commercial Edition implements a new invisible hash join algorithm, which would compile such queries as a hash IN join that runs in parallel, and is thus more performant and scalable.

HughWilliams avatar Apr 26 '24 15:04 HughWilliams

Thank you for your insight!

The workaround is to ingest graph data of the same group into separate Virtuoso databases and execute the queries accordingly. Will the implementation be ported to Virtuoso Open Source at some point?

mhoangvslev avatar Apr 29 '24 09:04 mhoangvslev

There are no plans for the invisible hash join feature to be ported to open source, it is a commercial only feature.

HughWilliams avatar Apr 29 '24 15:04 HughWilliams