penumbra icon indicating copy to clipboard operation
penumbra copied to clipboard

pd: support state tarball for joining nodes

Open conorsch opened this issue 1 year ago • 4 comments

When chain upgrades are performed (#1804), pd state may be collapsed by a migration, such that late-joining nodes (i.e. nodes that join the network after the upgrade boundary has passed) will not be able to verify historical state. To support late-joining nodes, we must provide the capability for pd testnet join to accept compressed archives of historical pd state, and use them during bootstrapping.

Proposal: add new optional flag --snapshot-url=<URL> to pd testnet join. Doing so will allow late-joining nodes to pull down a compressed archive from a remote URL, and extract that archive as starting state for pd.

Specifically, this requires:

  • [ ] Defining archive format and structure (e.g. "all files/directories should be extracted to ~/.penumbra/testnet_data/node0/pd"). My understanding is we'll need at least 1) rocks db and 2) genesis file in all cases.
  • [ ] Providing hosting capability for future snapshots (ideally community validators will assist with this process, but we still need to host snapshots we create somewhere)
  • [ ] Write logic for pd testnet join --snapshot-url <url>.
  • [ ] Write user-facing documentation for using the flag.
  • [ ] Write developer-facing documentation for storing and updating snapshots.

conorsch avatar Feb 16 '24 22:02 conorsch

What's the advantage of doing this rather than providing a .tar.xz of the pd home directory?

hdevalence avatar Feb 16 '24 22:02 hdevalence

As I understand it, that's what the snapshot is: a compressed version of the rocksdb info that pd uses. It must also include a genesis file, which is not included in the pd home directory, but easy enough to overwrite when generating new configs. This ticket is essentially describing the need and the mechanism to "provide a .tar.xz of the pd home directory."

conorsch avatar Feb 16 '24 22:02 conorsch

Got it, I was confused by the term "snapshot" because CometBFT has a notion of p2p snapshot exchange, which we're not currently using.

hdevalence avatar Feb 16 '24 22:02 hdevalence

Thanks, edited for clarity, s/snapshot/archive/ throughout.

conorsch avatar Feb 16 '24 22:02 conorsch

Setting this to P-high since this is a requirement to perform a testnet upgrade (both for compaction and migrations) and must be assigned during sprint planning

erwanor avatar Mar 18 '24 16:03 erwanor

@erwanor I'll grab this one and give it a shot, trying to parallelize the work with what you've already got in flight on the upgrades front.

conorsch avatar Mar 18 '24 20:03 conorsch

Resolved via #4055, also #4093.

conorsch avatar Mar 25 '24 15:03 conorsch

Write user-facing documentation for using the flag.

Ah, still more to go. Working on this today.

conorsch avatar Mar 25 '24 15:03 conorsch