consensus-specs Multiple fork transitions on same slot/epoch discard old `fork

The result is that certain messages across multiple-fork-transition-per-epoch boundaries might not be verifiable when received.

https://github.com/ethereum/consensus-specs/blob/dev/specs/altair/fork.md#upgrading-the-state states that the fork of the upgraded state is

        fork=Fork(
            previous_version=pre.fork.current_version,
            current_version=ALTAIR_FORK_VERSION,
            epoch=epoch,
        ),

https://github.com/ethereum/consensus-specs/blob/dev/specs/bellatrix/fork.md#upgrading-the-state states that for Bellatrix, it's:

        fork=Fork(
            previous_version=pre.fork.current_version,
            current_version=BELLATRIX_FORK_VERSION,
            epoch=epoch,
        ),

And for https://github.com/ethereum/consensus-specs/blob/dev/specs/capella/fork.md#upgrading-the-state, it's

        fork=Fork(
            previous_version=pre.fork.current_version,
            current_version=CAPELLA_FORK_VERSION,
            epoch=epoch,
        ),

That is, even if there was no observable beacon chain-time during which an ALTAIR_FORK_EPOCH == BELLATRIX_FORK_EPOCH or BELLATRIX_FORK_EPOCH == CAPELLA_FORK_EPOCH network existed in that intermediate fork, it will still show up as state.fork.previous_version of the fork to which it was upgraded, not necessarily the chronologically previous fork in question (e.g., the fork which might appear in the beacon API fork schedule).

This means that signatures of attestations, for example, from a slot or two before an ALTAIR_FORK_EPOCH == BELLATRIX_FORK_EPOCH or BELLATRIX_FORK_EPOCH == CAPELLA_FORK_EPOCH transition and then included in slots (which is supposed to be valid for up to 32 slots) cannot be verified afterward by a conforming client using https://github.com/ethereum/consensus-specs/blob/dev/specs/phase0/beacon-chain.md#get_domain as written:

def get_domain(state: BeaconState, domain_type: DomainType, epoch: Epoch=None) -> Domain:
    ...
    fork_version = state.fork.previous_version if epoch < state.fork.epoch else state.fork.current_version
    ...

because fork_version cannot access an old enough version in state by then.

When fork transitions only occur on successive epochs, the attestations which trigger this are old enough to be invalid (ATTESTATION_PROPAGATION_SLOT_RANGE == 32), but in either the minimal preset or immediately-after-each-other fork transitions, there can be glitches around these fork boundaries.

May 24 '22 17:05 tersec

this system was written w/ the expectation that forks would be far enough apart to not hit this issue and we only really see this type of thing in artificial settings like testnets etc.

to fix it, we would need to store the full fork schedule in the beacon state and modify many of the helpers to account for finding the right fork version (or even rewrite the messages to accelerate search)

I personally don't think it is worth the additional consensus complexity to support this type of thing although if there is some huge thing I'm missing I'm open to discussing further :)

May 26 '22 15:05 ralexstokes

I agree that this is somewhat theoretical, insofar as real networks can be constructed not to have this problem. I also agree that fixing properly this might not be worth the technical complexity, for the reasons you outline.

My concerns are:

those artificial test cases come up disproportionately often, in, well, testing. People running Kurtosis, Hive and other short-lived local testnets for testing purposes would generally prefer not to waste time on forks they're not testing, so there's a temptation to trigger these edge cases. Either these should be well-defined or prohibited by the spec;
as written, the spec doesn't mention this, so it relies on a kind of folk knowledge creating potential consensus disagreements (e.g., above, which attestations to count towards a block's validity, per fork choice rules, where some CL clients might decide to strictly use information in state.fork and others might rely on their knowledge of the fork schedule as a whole, and the intuitively "reasonable" one might not line up with the technically-correct one, an unfortunate situation for a specification); and
the mismatch between minimal presets' 8 slots per epoch and ATTESTATION_PROPAGATION_SLOT_RANGE == 32 regardless of preset results in this occurring even without the more obviously artificial cases where fork epochs equal each other, and at least seems to be an example of where there's supposed to be a not-actually-coincidental lineup of numeric constants that allows the protocol to work, but only under mainnet presets, which risks creating false-positives in terms of apparent bugs in the minimal preset that can't occur in mainnet and reduces the utility of the minimal preset.

I'd prefer to specify something along the lines of, e.g., no two fork epochs can be equal if they're not 0. If people want to run clients or tests that way regardless, but then the outcome is not defined per spec. Already, one can reasonably assert that CAPELLA_FORK_EPOCH >= BELLATRIX_FORK_EPOCH >= ALTAIR_FORK_EPOCH >= GENESIS_EPOCH, but if BELLATRIX_FORK_EPOCH == ALTAIR_FORK_EPOCH > 0 or CAPELLA_FORK_EPOCH == BELLATRIX_FORK_EPOCH > ALTAIR_FORK_EPOCH > GENESIS_EPOCH won't actually reliably work, without glitches, then the spec note this.

This way, Kurtosis, Hive, and similar systems could elide an arbitrary number of forks initially, because no ambiguity exists, but could not skip multiple forks afterwards.

May 27 '22 08:05 tersec

Multiple fork transitions on same slot/epoch discard old `fork_version`s