rdf4j icon indicating copy to clipboard operation
rdf4j copied to clipboard

LMDB Store performance improvements for smaller datasets

Open hmottestad opened this issue 1 year ago • 4 comments

The use of LMDB should allow the LmdbStore to scale better than the NativeStore. When testing the LmdbStore with the ShaclSail it seems that the NativeStore is often faster for smaller datasets when faced with a concurrent workload of small queries. We should try to improve the performance for these workloads by profiling the code and testing out new approaches.

hmottestad avatar Aug 30 '22 13:08 hmottestad

It's a shot in the dark but I think it is related to the way how cardinalities are currently computed.

kenwenzel avatar Aug 31 '22 14:08 kenwenzel

That was in fact one issue I found. It's not so much the quality of the statistics but rather that it take a lot of time to calculate them because of the read locks and IO. Adding a cache with a very short eviction helped in my testing.

hmottestad avatar Aug 31 '22 15:08 hmottestad

Maybe we could also have a look on zetasketch https://github.com/google/zetasketch

kenwenzel avatar Sep 05 '22 08:09 kenwenzel

https://datasketches.apache.org/docs/Theta/ThetaSketchFramework.html

kenwenzel avatar Sep 05 '22 09:09 kenwenzel