[Chainstate] Add migration path and new storage access methods for switching from hex-serialized to binary-serialized data
Per #6449, a lot of columns throughout the system today are stored as hexadecimal strings. We should instead use a binary representation.
The migration will be expensive, so an offline migration command will need to be implemented. The node should support both the hex and binary representations for at least one release in order to give people time to do the migration on their own schedules.
To elaborate more on this, the implementation will look as follows:
-
Every
structthat has aFromRowor aFromColumnimplementation that encodes or decodes the data as a hex string will have two code paths:- The original encode/decode paths, where a hex string will be used. When storing the data, the DB schema will be checked to see whether or not the binary or hex representation should be used.
- A new binary representation, where the bytes will be written directly. These column values will be given a byte prefix that cannot be an ASCII hex character, so the decode path will know how to decode the bytes (hex or binary). This doesn't require any schema changes -- fortunately for us, sqlite does not enforce column types, so
TEXTwill continue to suffice for now.
-
A migration command for
stacks-core, which will carry out the DB migrations on the sortition DB, chainstate DB, staging blocks DB, burnchain DB, mempool DB, SPV DB, and MARF index DBs.
I don't intend to do this in one shot. My implementation plan is as follows. Each task will leave stacks-core in a working state
- Add the new encode/decode paths for
FromRowandFromColumndefinitions, add the migration command, and add the migration logic for the sortition DB - Add migration logic for the chainstate DB
- Add migration logic for the burnchain DB
- Add migration logic for the mempool DB
- Add migration logic for the SPV DB
- Add migration logic for the MARF index DBs
(Steps 2-6 can be done in parallel).
In addition to FromRow and FromColumn structs, we should handle the following structs specially, since they contain lots of binary data but are not serialized as hex strings. Instead, I'll add a SIP-003 representation (which will have a different byte prefix than a raw binary string).
Vec<MessageSignature>(used in Nakamoto block headers)MissedBlockCommitVec<PoxAddress>(used inLeaderBlockCommitOp)Vec<Treatment>(used inLeaderBlockCommitOp)
I'll update this list as I find more.