rdflib icon indicating copy to clipboard operation
rdflib copied to clipboard

Performance issues with rdflib.compare

Open william-vw opened this issue 1 year ago • 1 comments

We're having some performance issues with using rdflib.compare. While the test files contains a lot of blank nodes, they are relatively small (50, 75, 100 lines); even then one can see the performance drop with order(s) of magnitude. (Our actual files have close to 6000 lines).

Checkout the repo here with the test code: https://github.com/william-vw/rdflib_compare_test

On my machine, I get the following times:

file: test1.ttl
isomorphic: 0.06714916229248047
canonical: 0.03405618667602539

file: test2.ttl
isomorphic: 3.1934521198272705
canonical: 1.6439540386199951

file: test3.ttl
isomorphic: 17.75542116165161
canonical: 8.694658994674683

I'm basing myself on this example to perform the compare (for testing purposes I'm comparing a graph with itself).

william-vw avatar Aug 15 '23 17:08 william-vw

Hi @william-vw, sadly the code for compare is quite complex, but any improvements will be welcomed and merged.

aucampia avatar Aug 16 '23 19:08 aucampia