penumbra
penumbra copied to clipboard
pd: support state tarball for joining nodes
When chain upgrades are performed (#1804), pd state may be collapsed by a migration, such that late-joining nodes (i.e. nodes that join the network after the upgrade boundary has passed) will not be able to verify historical state. To support late-joining nodes, we must provide the capability for pd testnet join
to accept compressed archives of historical pd state, and use them during bootstrapping.
Proposal: add new optional flag --snapshot-url=<URL>
to pd testnet join
. Doing so will allow late-joining nodes to pull down a compressed archive from a remote URL, and extract that archive as starting state for pd.
Specifically, this requires:
- [ ] Defining archive format and structure (e.g. "all files/directories should be extracted to
~/.penumbra/testnet_data/node0/pd
"). My understanding is we'll need at least 1) rocks db and 2) genesis file in all cases. - [ ] Providing hosting capability for future snapshots (ideally community validators will assist with this process, but we still need to host snapshots we create somewhere)
- [ ] Write logic for
pd testnet join --snapshot-url <url>
. - [ ] Write user-facing documentation for using the flag.
- [ ] Write developer-facing documentation for storing and updating snapshots.
What's the advantage of doing this rather than providing a .tar.xz
of the pd
home directory?
As I understand it, that's what the snapshot is: a compressed version of the rocksdb info that pd uses. It must also include a genesis file, which is not included in the pd home directory, but easy enough to overwrite when generating new configs. This ticket is essentially describing the need and the mechanism to "provide a .tar.xz of the pd home directory."
Got it, I was confused by the term "snapshot" because CometBFT has a notion of p2p snapshot exchange, which we're not currently using.
Thanks, edited for clarity, s/snapshot/archive/
throughout.
Setting this to P-high since this is a requirement to perform a testnet upgrade (both for compaction and migrations) and must be assigned during sprint planning
@erwanor I'll grab this one and give it a shot, trying to parallelize the work with what you've already got in flight on the upgrades front.
Resolved via #4055, also #4093.
Write user-facing documentation for using the flag.
Ah, still more to go. Working on this today.