rdf4j GH-4134 LMDB Store performance improvements

GitHub issue resolved: #4134

Briefly describe the changes proposed in this PR:

PR Author Checklist (see the contributor guidelines for more details):

[ ] my pull request is self-contained
[ ] I've added tests for the changes I made
[ ] I've applied code formatting (you can use mvn process-resources to format from the command line)
[ ] I've squashed my commits where necessary
[ ] every commit message starts with the issue number (GH-xxxx) followed by a meaningful description of the change

Sep 01 '22 07:09 hmottestad

@kenwenzel here are some optimizations I'm testing out.

Sep 01 '22 07:09 hmottestad

Looks good. I also thought about integrating the estimate functions of libmdbx into LMDB: https://libmdbx.dqdkfa.ru/group__c__rqest.html

It should be rather easy to patch LMDB but that would also implicate that we have to maintain a patched version of the LWJGL bindings...

Sep 01 '22 07:09 kenwenzel

Looks good. I also thought about integrating the estimate functions of libmdbx into LMDB: https://libmdbx.dqdkfa.ru/group__c__rqest.html

It should be rather easy to patch LMDB but that would also implicate that we have to maintain a patched version of the LWJGL bindings...

Do you think it would be possible to contribute it to LMDB instead?

Reading through he LMDB documentation I saw that there should be some cursor operations to get the first and last page, if the database is ordered. Maybe there is some way to estimate the size based on the addresses of the first and last page?

I remember you mentioned that you were reading a max of 1000 elements from the cursor when estimating the cardinality, is that still the case? Maybe we could do some optimized caching where we evict cardinality results based on how much resources it took to calculate them. So the cardinality for null rdf:type null null would be cached much longer than :peter rdf:type null null.

There is still the option of using the code from the ExtensibleStore which uses Hyper Log Log. I think the biggest issue with that code is that it requires refreshing if there is a lot of churn (data added and removed and then added again). During that refresh cycle it needs to read all the data from the database which would probably block all writers.

Sep 01 '22 07:09 hmottestad

Looks good. I also thought about integrating the estimate functions of libmdbx into LMDB: https://libmdbx.dqdkfa.ru/group__c__rqest.html It should be rather easy to patch LMDB but that would also implicate that we have to maintain a patched version of the LWJGL bindings...

Do you think it would be possible to contribute it to LMDB instead?

I've already posted a question on the OpenLDAP mailing list where I asked them to implement the feature. Unfortunately, I've got no response so far. But maybe I should offer my support regarding this topic...

Reading through he LMDB documentation I saw that there should be some cursor operations to get the first and last page, if the database is ordered. Maybe there is some way to estimate the size based on the addresses of the first and last page?

This is not possible as the keys are ordered but the b-tree pages are just randomly distributed.

I remember you mentioned that you were reading a max of 1000 elements from the cursor when estimating the cardinality, is that still the case? Maybe we could do some optimized caching where we evict cardinality results based on how much resources it took to calculate them. So the cardinality for null rdf:type null null would be cached much longer than :peter rdf:type null null.

Yes, that is still the case. I read a fixed number of entries to determine their medium distance. Then I skip to the last key and try to extrapolate the whole number based on the distance between the keys. I'm currently working on optimizing this a bit to fix #3982.

There is still the option of using the code from the ExtensibleStore which uses Hyper Log Log. I think the biggest issue with that code is that it requires refreshing if there is a lot of churn (data added and removed and then added again). During that refresh cycle it needs to read all the data from the database which would probably block all writers.

I also thought about implementing this but it would also require a considerable amount of storage space.

Sep 01 '22 11:09 kenwenzel

FYI - I am also working on improving the cardinality computation itself. Instead of just using the first n rows as samples to estimate the cardinality, I divide the whole search space into b buckets and then I use samples from these buckets for the estimation. I will probably push a PR this week.

Sep 12 '22 06:09 kenwenzel

@hmottestad I've tested the cardinality cache but can't see any improvments in our standard query benchmarks. Does this only pay off when using multi-threaded access?

Sep 14 '22 11:09 kenwenzel

I had a scenario with the ShaclSail where it pays off because the ShaclSail generates a lot of small and similar queries during validation. But now it's hard to reproduce because I introduced some safety checks to the query optimizer pipeline in a previous PR and those checks are very slow. Those are only enabled when assertions are enabled, so they don't affect our users.

Sep 14 '22 13:09 hmottestad

@hmottestad Do you have any benchmarks with and without the cache?

Sep 15 '22 13:09 kenwenzel

@hmottestad Do you have any benchmarks with and without the cache?

No. I need to create some LMDB + SHACL benchmarks.

Sep 15 '22 17:09 hmottestad

I've looked into this branch again and I can't reproduce the initial performance issues. So I've modified the branch to only add the tests and benchmark.

Jul 28 '23 13:07 hmottestad

rdf4j rdf4j copied to clipboard

GH-4134 LMDB Store performance improvements

rdf4j
rdf4j copied to clipboard