rdf4j
rdf4j copied to clipboard
GH-5343 Make LMDBSail Size() 36000x Faster 🚀🚀🚀🚀
GitHub issue resolved: # #5343 Briefly describe the changes proposed in this PR:
This PR introduces an optimization for the size(...) method in the LMDBstore implementation. Introduce a cardinalityExact to calcualte the exact size, leverageingmdb_stats when possible.
Key Changes
-
ChangeSet
-
Fast-path Optimization
- When there are no
approvedordeprecatedchanges in the current transaction or not in a transcation, the method directly delegates to thederivedFromstore for fast cardinality estimation.
- When there are no
-
Fallback to Iterator
- When changes exist in the transaction, the method falls back to streaming through matching statements using
getStatements(...).stream().count(). This bypasses LMDB’s lazy evaluation to ensure consistency, even with uncommitted changes.
- When changes exist in the transaction, the method falls back to streaming through matching statements using
-
-
Low-level Size Calculation (
cardinalityExact)- If the pattern is completely unspecified (i.e. all wildcards), the method uses LMDB's
mdb_statto return the total size efficiently. - For specific patterns, it iterates over both explicit and implicit triples and counts the results.
- If the pattern is completely unspecified (i.e. all wildcards), the method uses LMDB's
Perf
I created a LMDBSail with 10M triples.
Original size(): 21802ms
Optimized size():
685.6 μs to get the full size by leveraging mdb_stats.
274.2 ms to get the size of a context of 5000000 triples.
Total Size: 10000000, Time taken: 685.6 μs
Size in context: 5000000, Time taken: 274.2 ms
PR Author Checklist (see the contributor guidelines for more details):
- [x] my pull request is self-contained
- [x] I've added tests for the changes I made
- [x] I've applied code formatting (you can use
mvn process-resourcesto format from the command line) - [x] I've squashed my commits where necessary
- [x] every commit message starts with the issue number (GH-xxxx) followed by a meaningful description of the change