Typing problems when a container is re-defined in a superseding phase
Newer phases re-define containers like BeaconState, BeaconBlockBody, AttestationData, etc. A typical case is adding a new field, however, altering field name/type is possible too. This leads to typing problems in some cases (see below).
Currently, such problems are relevant in static type checking context only. However, pyspecs eventually may use isinstance, e.g. to discriminate Union components (see, for example, #2333). Thus, typing details may affect runtime behavior too..
One particular example is update_pending_votes in sharding. BeaconBlockBody contains a list of phase0.Attestation, which in turn contains data field of phase0.AttestationData type.
Re-defined sharding.process_attestation calls update_pending_votes, which at some moment tries to access shard_header_root, which is defined in sharding.AttestationData, not in phase0.AttestationData. Additionally, the re-defined sharding.AttestationData is not a subclass of phase0.AttestationData.
There are similar problems with Validator in the custody_game phase.
One way to solve the problem could be to make sharding.AttestationData extend phase0.AttestationData. Then, phase0.BeaconData can "legally" contain them. However, one should additionally add a cast in update_pending_votes. Since, typing.cast doesn't check anything, one could do something like this:
def update_pending_votes(state: BeaconState, attestation: phase0.Attestation) -> None:
assert isinstance(attestation.data, sharding.Attestation)
attestation_data = cast(Attestation, attestation.data)
...
However, such approach won't always work. For example, altair.BeaconState shouldn't perhaps be made to extend phase0.BeaconState, as it replaces *_epoch_attestations fields with *_epoch_participation.
Another problem is that mutable collections like ssz.List are invariant, e.g. List[sharding.Attestation,...] would not be a subtype of List[phase0.Attestation,...], even if one made sharding.Attestation extend phase0.Attestation. Testing/converting lengthy lists can be expensive in runtime too.
Additionally, runtime checks like isinstance(attestation.data, sharding.AttestationData) are not able to guarantee absence of such "typing rules" violations, while it's possible with static analysis.
As a part of my research I plan to investigate customized typing rules, which model Python dynamic duck typing more precisely. I.e. it should be sometimes possible to prove that Attestation::data points to sharding.AttestationData instances only, under some conditions, of course (e.g. BeaconBlockBody is well-formed according to sharding "rules").
One tricky aspect here is that static type checking depends on how actually full phase specification is constructed from relevant phase definitions.
E.g. in the example above, one additionally re-define Attestation, BeaconBlock, SignedBeaconBlock classes in sharding, so that sharding.AtestationData is accessible from a beacon block. This will also require summoning other necessary methods from prior phases.
Such full phase specification construction can be automated (manual overrides may be necessary to deal with tricky cases), however, it will likely be a non-trivial procedure.
I am closing this issue because it seems stale. Please, do not hesitate to reopen it if this is a mistake