rdf4j
rdf4j copied to clipboard
LMDB Store performance improvements for smaller datasets
The use of LMDB should allow the LmdbStore to scale better than the NativeStore. When testing the LmdbStore with the ShaclSail it seems that the NativeStore is often faster for smaller datasets when faced with a concurrent workload of small queries. We should try to improve the performance for these workloads by profiling the code and testing out new approaches.
It's a shot in the dark but I think it is related to the way how cardinalities are currently computed.
That was in fact one issue I found. It's not so much the quality of the statistics but rather that it take a lot of time to calculate them because of the read locks and IO. Adding a cache with a very short eviction helped in my testing.
Maybe we could also have a look on zetasketch https://github.com/google/zetasketch
https://datasketches.apache.org/docs/Theta/ThetaSketchFramework.html