rdf4j LMDB Store performance improvements for smaller datasets

LMDB Store performance improvements for smaller datasets

Open hmottestad opened this issue 1 year ago • 4 comments

The use of LMDB should allow the LmdbStore to scale better than the NativeStore. When testing the LmdbStore with the ShaclSail it seems that the NativeStore is often faster for smaller datasets when faced with a concurrent workload of small queries. We should try to improve the performance for these workloads by profiling the code and testing out new approaches.

Aug 30 '22 13:08 hmottestad

It's a shot in the dark but I think it is related to the way how cardinalities are currently computed.

Aug 31 '22 14:08 kenwenzel

That was in fact one issue I found. It's not so much the quality of the statistics but rather that it take a lot of time to calculate them because of the read locks and IO. Adding a cache with a very short eviction helped in my testing.

Aug 31 '22 15:08 hmottestad

Maybe we could also have a look on zetasketch https://github.com/google/zetasketch

Sep 05 '22 08:09 kenwenzel

https://datasketches.apache.org/docs/Theta/ThetaSketchFramework.html

Sep 05 '22 09:09 kenwenzel

rdf4j rdf4j copied to clipboard

LMDB Store performance improvements for smaller datasets

rdf4j
rdf4j copied to clipboard