bee icon indicating copy to clipboard operation
bee copied to clipboard

Embed postage snapshot in binary

Open Cafe137 opened this issue 8 months ago • 3 comments

Summary

Embed a postage snapshot into the binary to skip the syncing of ~8 million blocks.

Motivation

  • Most non-archival Gnosis nodes are now purging old postage contract events, making syncing impossible. This also increases system requirements for self-hosted Gnosis nodes
  • Following the official quick start guide, postage syncing takes 80% of the setup time
  • Given there are around 1,800 active batches, the snapshot - even uncompressed - would be ~0.25MB, which is small enough to embed efficiently.

Implementation

  • [ ] Provide a way to generate the postage snapshot independently. This could be a standalone tool, or a Bee command, that takes blockchain-rpc-endpoint and end-block parameters, and outputs a bin file.
  • [ ] Embed the snapshot bin in the Bee binary, compressed or uncompressed
  • [ ] Modify the syncing logic to use the snapshot and skip old blocks

Drawbacks

  • Requires user trust in the validity of the snapshot, but this is mitigated by making snapshot generation fully reproducible and verifiable
  • Requires maintenance of the snapshot, e.g. updating it before every release
  • The snapshot may eventually become too large to embed

Materials

  • https://github.com/ethersphere/bee/issues/5060 Calculations were done here for the snapshot size
  • https://github.com/ethersphere/bee/issues/5000 This was the previous approach

Cafe137 avatar Apr 09 '25 09:04 Cafe137

@Cafe137 @significance Is it possible to enhance this by storing the snapshot hash on the blockchain? It could let nodes optionally verify the embedded snapshot

gacevicljubisa avatar Apr 09 '25 14:04 gacevicljubisa

i'm not against this but i am not sure how useful it is really, the security assumption still comes down to the ref in codebase that's running i.e. the contract address in this case. suggest we just check the hash of the snapshot against a hash stored as a constant which is easy to check against github and then we aren't changing trust assumptions but maintain simplicity of implementation/release process. also, for this the worst that will happen is the reserve won't match so it's not so critical 🐰

significance avatar Apr 09 '25 14:04 significance

hey guys, @nikipapadatou asked me to clarify. i am not an expert in the Bee codebase but the idea is that it should be fairly straightforward to simply replace the old approach i.e. getting the batch data from swarm with a new approach i.e. including the batch data in the binary distribution. everything else should stay the same, data format and so on

it's suggested that a "latest" batch snapshot is kept up to date with a github action in a separate github repo (eg. ethersphere/postage-batch-snapshot) just for this, then when the ci/cd builds the Bee binary it could automatically fetch the latest snapshot and include it in the build

in this way, the binary should include the data for now rather than fetch it from the repo for the time being. when the snapshot itself becomes say, greater than a couple of mb then we should reconsider our approach

feel free to @ me for anything ( :

significance avatar Apr 10 '25 13:04 significance