solana icon indicating copy to clipboard operation
solana copied to clipboard

Support zstd genesis archives

Open steviez opened this issue 2 years ago • 7 comments

Taking over https://github.com/solana-labs/solana/pull/33614 from @ripatel-fd - we exchanged some DM's and agreed I could take this off his hands to drive to completion.

Problem

  • BZip2 is deprecated in most of the Solana protocol.
  • Genesis archives are one of the few places where BZip2 is still the main compression algorithm.
  • solana-test-validator creates a tar subprocess when creating new ledgers.

Summary of Changes

  • Allows the validator to discover and load genesis.tar.zst archives (continues to support genesis.tar.bz2)
  • Uses the tar crate instead of a subprocess to create new genesis archives.
  • Uses genesis.tar.zst when creating new ledgers

The force pushes were to rebase to tip of master in order to run against latest changes

steviez avatar Nov 14 '23 21:11 steviez

do we intend to migrate the public clusters to zstd genesis? if so, how are we going to bootstrap?

t-nelson avatar Nov 15 '23 19:11 t-nelson

do we intend to migrate the public clusters to zstd genesis? if so, how are we going to bootstrap?

I don't think we have to necessarily; any reason not to continue to support both in tandem? If we did want to fully migrate to zstd / phase out bz2, I think that'd look something like:

  • Get mnb to a version where client can serve zstd (whichever branch this lands in, call it v1.18)
  • Make a change for clients to start requesting zstd (v1.19)
  • Stop serving bz2 in v1.20

I'm not sure we'd want to change this value tho at the risk of breaking things outside of our codebase that might have it hardcoded: https://github.com/solana-labs/solana/blob/6a5b8e86f3c492a4984e780591cc97a027a59a8a/sdk/src/genesis_config.rs#L38

steviez avatar Nov 15 '23 20:11 steviez

CI failed with localnet. Namely, the non-bootstrap node failed to get genesis:

[2023-11-14T23:34:37.828034903Z WARN  solana_validator::bootstrap]
Failed to load genesis config: Unable to open "/solana/config/validator/genesis.bin": Os { code: 2, kind: NotFound, message: "No such file or directory" }

It is requesting the old archive as noted here: https://github.com/solana-labs/solana/blob/5658d6ee5bcf132b60f94857624f92cb2239706e/genesis-utils/src/lib.rs#L53-L62

On the bootstrap node, I can see a log that it attempted to serve the old file format but that didn't yield an actual transfer:

[2023-11-14T23:34:37.823021949Z INFO  solana_rpc::rpc_service]
get /genesis.tar.bz2 -> "/solana/config/bootstrap-validator/genesis.tar.bz2" (0 bytes)

I thought there was some code somewhere that created the archive if you didn't have it; either I'm misremembering or that code is not getting hit for some reason. Will dig in further

steviez avatar Nov 17 '23 05:11 steviez

I thought there was some code somewhere that created the archive if you didn't have it; either I'm misremembering or that code is not getting hit for some reason. Will dig in further

don't think this is the case. only place i see a bz2 encoder is snapshot and bigtable

t-nelson avatar Nov 17 '23 20:11 t-nelson

don't think this is the case. only place i see a bz2 encoder is snapshot and bigtable

Also took a look and I believe you're correct; skimming the logic, it looks like a node run WITHOUT --no-genesis-fetch will still download the archive, even if it has a genesis.bin locally. That might be what I'm remembering, and we should probably adjust the logic to check for genesis.bin OR genesis archive before issuing a request for another one (seemingly a separate PR for that).

That presents two problems:

  1. How do we know which format to request?
  2. Given that my previous assumption was incorrect, your comment about how we bootstrap this is relevant again.

For 1., I think we could have simple logic to request one first (zstd), and if that doesn't yield a file, then fall back to bz2. For 2., I think we could have nodes that are getting run with RPC enabled create the genesis.tar.zstd archive at startup so it has both.

Thoughts ?

steviez avatar Nov 18 '23 01:11 steviez

i think to keep this pr small, let's just support reading zstd compressed genesis archives. we can add generation to solana-genesis in a followup. then worry about the if/how to migrate existing bz2 clusters afterwards

t-nelson avatar Nov 27 '23 19:11 t-nelson

i think to keep this pr small, let's just support reading zstd compressed genesis archives. we can add generation to solana-genesis in a followup. then worry about the if/how to migrate existing bz2 clusters afterwards

Works for me

steviez avatar Nov 27 '23 21:11 steviez

I'll reopen this in Agave; has been on the backburner for quite a while now but should be quick to push it through

steviez avatar Feb 23 '24 16:02 steviez