rdf4j icon indicating copy to clipboard operation
rdf4j copied to clipboard

GH-5343 Make LMDBSail Size() 36000x Faster 🚀🚀🚀🚀

Open odysa opened this issue 4 months ago • 12 comments

GitHub issue resolved: # #5343 Briefly describe the changes proposed in this PR:

This PR introduces an optimization for the size(...) method in the LMDBstore implementation. Introduce a cardinalityExact to calcualte the exact size, leverageingmdb_stats when possible.

Key Changes

  • ChangeSet

    • Fast-path Optimization

      • When there are no approved or deprecated changes in the current transaction or not in a transcation, the method directly delegates to the derivedFrom store for fast cardinality estimation.
    • Fallback to Iterator

      • When changes exist in the transaction, the method falls back to streaming through matching statements using getStatements(...).stream().count(). This bypasses LMDB’s lazy evaluation to ensure consistency, even with uncommitted changes.
  • Low-level Size Calculation (cardinalityExact)

    • If the pattern is completely unspecified (i.e. all wildcards), the method uses LMDB's mdb_stat to return the total size efficiently.
    • For specific patterns, it iterates over both explicit and implicit triples and counts the results.

Perf

I created a LMDBSail with 10M triples. Original size(): 21802ms Screenshot 2025-05-31 at 11 57 46 AM

Optimized size(): 685.6 μs to get the full size by leveraging mdb_stats. 274.2 ms to get the size of a context of 5000000 triples.

Total Size: 10000000, Time taken: 685.6 μs
Size in context: 5000000, Time taken: 274.2 ms

PR Author Checklist (see the contributor guidelines for more details):

  • [x] my pull request is self-contained
  • [x] I've added tests for the changes I made
  • [x] I've applied code formatting (you can use mvn process-resources to format from the command line)
  • [x] I've squashed my commits where necessary
  • [x] every commit message starts with the issue number (GH-xxxx) followed by a meaningful description of the change

odysa avatar May 31 '25 16:05 odysa