go-ethereum icon indicating copy to clipboard operation
go-ethereum copied to clipboard

Setting up archive node sets

Open holiman opened this issue 2 years ago • 3 comments

Lifted up from https://github.com/ethereum/go-ethereum/issues/24413#issuecomment-1048536599. Opened here for discussion, to see if there's anything we can/should do to make this easier.

Some data-providers would benefit from a scenarion like the following:

  1. Node A has (all) state for blocks 0-2M,
  2. Node B has all state for blocks 2M-3M,
  3. etc.. N. Node N has state from 13M to head

This is possible, but would require a bit of coding, and some special setup.

The way to create a "archive node from 1M to 2M, currently, would be to:

  1. Use syncmode=full until 1M,
  2. Do a state-pruning
    • After pruning, you can also copy the datadir for use with the 2M-3M node, which needs to continue without gcmode=archive
  3. Use syncmode=full gcmode=archive between 1M and 2M
  4. Stop the node
  5. Run the node with --nodiscover --maxpeers=0.

I guess the one thing lacking to script up such a scenario right now is that we don't have a way to stop at a certain block, e.g. geth ...args.. --exit-at=2000000.

Another useful option would be to extend gcmode, so that one could say e.g. gcmode=0:full,1000000:archive,2000000:full, meaning it would be given a set of N:<mode>, in increasing order, and automatically switch at the given numbers. I'll file this up as a potential feature.

A third way to achieve this would be to have the blocks in separate rlp dumps, and use the geth import <0-1M> --gcmode=...; geth --snapshot prune ...; geth import <1M-2M.rlp> --gcmode=archive ... ,

This third way does not require any new features, and it's pretty optimal from a network perspective, since it doesn't require any network IO.

holiman avatar Feb 23 '22 08:02 holiman

It's an interesting idea.

In the approach you mentioned, it's still feasible to maintain a bunch of nodes for segmenting full archive states. And the biggest benefit is: we break this assumption that maintaining an archive node requires excellent hardware. And it's kind of scalable.

And since we will switch to path-based scheme, so I am thinking how can we integrate your idea with this scheme. Under the new scheme, we will only maintain a single latest state with a bunch of reverse diffs. The state can be reverted via applying the reverse diffs. So these nodes can also stop at a specific height and keep all the reverse diffs. Then the entire reverse diff history is shared by a node set, and state rewind cost is kind of acceptable.

rjl493456442 avatar Feb 24 '22 05:02 rjl493456442

Personally I think we can somehow offer the necessary functions in Geth, and the geth cluster(archive cluster) can be implemented in a separate project.

rjl493456442 avatar Feb 24 '22 05:02 rjl493456442

This looks like a feasible approach.

In my use case I want archive state > 180 but < 1M from the head block. So it makes sense to wipe the data, sync a full node, prune and start it in archive mode every 1M blocks.

So the exit-at flag would be very useful to set up a hot node to swap over too.

kamikazechaser avatar Jan 10 '23 09:01 kamikazechaser