E3: Remotion of KeysTable in `Domain`
PR: Remove keysTable from Domain
This PR removes the use of keysTable from Domain. Below is how this was achieved.
Modifications to ValsTable
Now, ValsTable in domains are dupsorted and have the following layout:
Key -> Inverted_Step + Value
This allows us to maintain the same ordering with respect to keys while also keeping multiple versions of each key. This slightly improves cases where one key has multiple values over different steps, as we decouple the key. Most importantly, it allows us to track each version of the key without needing another table, thereby eliminating the external step.
ValsTable: Code (largeVals=true)
For Code, this is not possible because it has large values, which makes MDBX problematic. Therefore, the Code table is kept as non-dupsort but modified with the layout:
Key + Inverted_Step -> Value
However, this only works if keys are of fixed size. This is because, in lexicographical order, Seek prioritizes lexicography over length. For example:
AAAB + FFFFFA(Key + Inv_Step)AAABA + FFFFFA
If we seek AAAB, we will actually get the value for AAABA, as the inverted step interferes.
Collate
In collate, for largeVals, we need to sort the keys afterward (for the same reason). Maybe we can replace that with a collector to avoid the extra linear RAM usage for code.
Results
- One less random read during
getLatestFromDB. - ChainData size after 10 days of running (2 prunes happened):
admin@erigon5900d:~/erigon$ du gg/chaindata/ -ch
12G gg/chaindata/
12G total
- Faster flushing of domains by about 25%
- getLatestFromDB was reworked to use valsTable, it is just a Seek + Check if stripped key == input key. Commitment have variable length keys so Instead we move cursor to where the last element (step=0) would be and call Prev() until we get to the first. overall it is one less random lookup in exchange of a few extras sequential ones, so reading speed should have slightly improved.
this could bring fullscan on each query since commitment reads from smallest prefix to largest during unfold. I want to use that fact by storing bt stateful cursors but first experiment was failed.
Prune already had GetExecV3PruneProgress and SaveExecV3PruneProgress and Domain itself responsible to keep track it's progress, not aggregator
Collate should use ETL instead of slice sorting due to potential sizes of collations.
@Giulio2002 maybe let's add some startup check of db version (or something else) and print "please rm -rf chaindata"? just to prevent startup on incompatible db and fail-fast.
FYI: we have var DBSchemaVersion = types.VersionReply{Major: 6, Minor: 1, Patch: 0} - i don't remember - will erigon auto-check if set here Major: 7.
@Giulio2002 maybe let's add some startup check of db version (or something else) and print "please
rm -rf chaindata"? just to prevent startup on incompatible db and fail-fast. FYI: we havevar DBSchemaVersion = types.VersionReply{Major: 6, Minor: 1, Patch: 0}- i don't remember - will erigon auto-check if set hereMajor: 7.
guess i am checking
Ok, adding the check is trivial, I will do it in a secondary PR
Ok, nevermind, I added the proper check and error message within the PR
I did test in various scenarios and file hashes matched. So, i merging it.