sist2
sist2 copied to clipboard
Additive vs. Subtractive incremental lmdb building?
When copy_table is far larger than new_table, we may as well first file copy the store and then add/remove entries from it. The db file copy should be done before scanner kicks off, so that the parse process will write/overwrite entries.
Current incremental behaviour:
- Start with an empty store
- During parse, only consider new/updated files and write to the store
- for each file in the original index that was marked for copy, load from the original store and copy it to the new store (could be a little slow)
Desired behaviour:
- Copy the entire original store (this is really fast/efficient)
- During parse, only consider new/updated files and write (or overwrite!) to the store
- For each file in the original index that was marked for deletion, remove from the new store
Is that correct?
One thing I'm worried about is that LMDB will not re-compact the DB, and that deleted entries will just be zeroed out without reclaiming any new space. So if there are many deleted files, the new store will not be smaller than the original store. There might be a way to ask LMDB to force de-fragmentation but I think that this is functionally equivalent to re-writing every key, which means that we're back to square one in terms of performance.
Yes exactly this.
FYI: https://www.openldap.org/lists/openldap-technical/201407/msg00134.html (note, this message is from 2014)
Fragmentation is a problem for larger-than-4k entries, but fine for smaller ones (probably automatically defrag'ed)
But this problem may be irrelevant if we're talking about repeatedly creating new db files (rather than bending a single mdb instance):
http://www.lmdb.tech/doc/group__mdb.html#ga3bf50d7793b36aaddf6b481a44e24244
During env copy we can do compaction and despite performance lose (vs. a plain copy) it's still much faster than inserting the keys one by one to a new store -- no need to sort the keys, no need for internal maintenance etc.
Okay, got it. That's really useful information, thank you