ouroboros-consensus icon indicating copy to clipboard operation
ouroboros-consensus copied to clipboard

Simplify the `protVer` arguments

Open nfrisby opened this issue 2 years ago • 9 comments

Without loss of generality, focus on Allegra. Start from the binding of protVerAllegra in protocolInfoCardano. EG at https://github.com/input-output-hk/ouroboros-consensus/blob/0079ee65f6c49d1c05a8d30ffa492b20c46b12b8/ouroboros-consensus-cardano/src/ouroboros-consensus-cardano/Ouroboros/Consensus/Cardano/Node.hs#L658

There, protVerAllegra:

  • (A) is passed to mkShelleyBlockConfig
  • (B) influences maxMajorProtVer

Also, maxMajorProtVer:

  • (C) is passed to tpraosParams and used to construct praosParams, to eventually be an upper bound in the ProtocolHeaderSupportsEnvelope envelopeChecks method (to throw the "obsolete node" errors).
  • (D) is passed to mkPartialLedgerConfigShelley which eventually uses it to constructor SL.Globals
mkShelleyBlockConfig ::
     ShelleyBasedEra era
  => SL.ProtVer
  -> SL.ShelleyGenesis (EraCrypto era)
  -> [SL.VKey 'SL.BlockIssuer (EraCrypto era)]
  -> BlockConfig (ShelleyBlock proto era)
mkShelleyBlockConfig protVer genesis blockIssuerVKeys = ShelleyConfig {
      shelleyProtocolVersion  = protVer
    , shelleySystemStart      = SystemStart  $ SL.sgSystemStart  genesis
    , shelleyNetworkMagic     = NetworkMagic $ SL.sgNetworkMagic genesis
    , shelleyBlockIssuerVKeys = Map.fromList
        [ (SL.hashKey k, k)
        | k <- blockIssuerVKeys
        ]
    }
data instance BlockConfig (ShelleyBlock proto era) = ShelleyConfig {
      -- | The highest protocol version this node supports. It will be stored
      -- the headers of produced blocks.
      shelleyProtocolVersion  :: !SL.ProtVer
-- | Praos parameters that are node independent
data PraosParams = PraosParams
    ...
    -- | All blocks invalid after this protocol version, see
    -- 'Globals.maxMajorPV'.
    praosMaxMajorPV        :: !MaxMajorProtVer,
data Globals = Globals
  ...
  , maxMajorPV :: !Version
  -- ^ All blocks invalid after this protocol version

  • (D) I just opened https://github.com/input-output-hk/cardano-ledger/issues/3682, which would eliminate this use.

  • (A) ultimately enables the node to write this version to a header field that is never scrutinized by the entire Cardano node code base. It is only used for SPOs to indicate in their blocks that they have upgraded their software in preparation for an imminent transition to the next ledger era.

  • (C) This use is motivated in a short paragraph in the the ledger spec, here: https://github.com/input-output-hk/cardano-ledger/blob/63d98c3f8e9eb2878cec3ab71c54fc40c798ac01/eras/shelley/formal-spec/protocol-parameters.tex#L162-L165 It is worth noting that the value scrutinized here doesn't come from the header, it instead comes from the ledger state (view). And most crucially: it is not the header from (A).

  • (B) The influence of allegraProtVer on maxMajorProtVer is via the latter's definition:

https://github.com/input-output-hk/ouroboros-consensus/blob/0079ee65f6c49d1c05a8d30ffa492b20c46b12b8/ouroboros-consensus-cardano/src/ouroboros-consensus-cardano/Ouroboros/Consensus/Cardano/Node.hs#L754-L775

Note that this use is exactly contrary to the warning comments where allegraProtVer is actually set, such as:

https://github.com/input-output-hk/cardano-node/blob/70dde196367e86c977fbe72167f4eb303b3ffa88/cardano-node/src/Cardano/Node/Protocol/Cardano.hs#L185-L194

        Consensus.ProtocolParamsAllegra {
          -- This is /not/ the Allegra protocol version. It is the protocol
          -- version that this node will declare that it understands, when it
          -- is in the Allegra era. That is, it is the version of protocol
          -- /after/ Allegra, i.e. Mary.
          allegraProtVer =
            ProtVer (natVersion @4) 0,
          allegraMaxTxCapacityOverrides =
            TxLimits.mkOverrides TxLimits.noOverridesMeasure
        }

It is currently my suspicion that all of these per-era *ProtVer arguments should be removed (thereby eliminating (B)), and replaced by some logic that defines maxMajorProtVer by analyzing the last given transitionIntraShelleyTrigger. And the per-era values used in (A) should instead all simply use maxMajorProtVer.

nfrisby avatar Aug 27 '23 23:08 nfrisby

Minor comments:

Note that this use is exactly contrary to the warning comments where allegraProtVer is actually set, such as:

https://github.com/input-output-hk/cardano-node/blob/70dde196367e86c977fbe72167f4eb303b3ffa88/cardano-node/src/Cardano/Node/Protocol/Cardano.hs#L185-L194

Yeah, I think this is a case where we actually loose generality by only considering Allegra: Ignoring experimental eras for now (this just changes what the "last era" is), protVerXXX has the following meaning ATM:

  • For all but the last era, it is the lowest major protocol version of the next era, ie if YYY is the successor of XXX, it is eraProtVerLow @YYY
  • For the last era, it is the highest major protocol of this last era, ie eraProtVerHigh @XXX.

Also see https://github.com/input-output-hk/ouroboros-consensus/pull/294#discussion_r1304111224

So in fact, we can safely use the protVer entry for the last era as the max major protocol version, despite the fact that protVerXXX is not a protocol version of the XXX era for all but the last era.

It is currently my suspicion that all of these per-era *ProtVer arguments should be removed (thereby eliminating (B)), and replaced by some logic that defines maxMajorProtVer by analyzing the last given transitionIntraShelleyTrigger.

That specific approach won't work directly:

  • TriggerHardFork does not necessarily contain a protocol version: https://github.com/input-output-hk/ouroboros-consensus/blob/0079ee65f6c49d1c05a8d30ffa492b20c46b12b8/ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/HardFork/Simple.hs#L11-L24
  • Even if it did, it does not contain any information on whether the last era already sustained an intra-era HF. Eg Babbage had an intra-era HF (from 7 -> 8, "Valentine HF"), so the latest TriggerHardForkAtVersion (ignoring Conway, which is experimental and should hence not be considered unless enabled explicitly via a flag) will only be 7, but we need to set the max major protocol version to at least 8 to avoid considering all blocks after the Valentine HF as invalid.

The general approach can of course still work, we just have to pass in more stuff ("worst-case", just the max major protocol version directly).

And the per-era values used in (A) should instead all simply use maxMajorProtVer.

Intuitively, that sounds like a good plan; main question is: do we ever want forging nodes to put different versions in their header for different eras? Maybe a statistical use case similar to https://github.com/input-output-hk/cardano-node/blob/70dde196367e86c977fbe72167f4eb303b3ffa88/cardano-node/src/Cardano/Node/Protocol/Cardano.hs#L210-L213 could arise?

amesgen avatar Aug 28 '23 10:08 amesgen

protVerXXX has the following meaning ATM

The changing semantics of the field---especially in a way dependent on the --experimental flag---is confusing for a "configuration data" data structure😬.

That specific approach won't work directly.

Yeah, I figured there isn't quite enough information/it wasn't accessible right now. But this still seems right to me as a high-level design target.

do we ever want forging nodes to put different versions in their header for different eras?

In general, sounds useful. And indeed you linked to a former example use (which correctly comments itself as a "HACK" :/).

But it also sounds not strictly necessary. So at the very least, I think it should be provided in a separate "overrides" argument, eg (perhaps limited to the minor version?). More directly: I think we should remove it now and add something like that back when it's next actually needed.

nfrisby avatar Aug 28 '23 13:08 nfrisby

PR #294 is definitely related. For context: I hadn't looked over that PR before drafting this Issue.

nfrisby avatar Aug 28 '23 13:08 nfrisby

PR https://github.com/input-output-hk/ouroboros-network/pull/3891 also seems related

Also, I transferred Issue #325 from ouroboros-network repo.

Wise final words from Jared in his comment:

but it will take some thought, and updating the specs

nfrisby avatar Aug 28 '23 14:08 nfrisby

The MaxMajorPV was added in this commit https://github.com/input-output-hk/cardano-ledger/commit/441e23bc7541ab0a535e9d24cda1d836c04b64ed

Edit: Just for context: that commit is from March 2020, which is before the Byron-Shelley hardfork.

nfrisby avatar Aug 28 '23 14:08 nfrisby

Esgen and I just discussed this all during a call. There are several overlapping concerns, sadly.

Testing overrides

For testing, we let config override the version-based triggers in the HFC.

Experimental flags

There are two "experimental" flags in the node: ExperimentalHardForksEnabled (<-> npcExperimentalHardForksEnabled) (formerly TestEnableDevelopmentHardForkEras) and ExperimentalProtocolsEnabled (<-> pncExperimentalProtocolsEnabled) (formerly TestEnableDevelopmentNetworkProtocols).

The node ultimately passes ExperimentalProtocolsEnabled to Consensus as srnEnableInDevelopmentVersions.

  , srnEnableInDevelopmentVersions  :: Bool
    -- ^ If @False@, then the node will limit the negotiated NTN and NTC
    -- versions to the latest " official " release (as chosen by Network and
    -- Consensus Team, with input from Node Team)

The node uses ExperimentalHardForksEnabled to alter the upper bounds on the *ProtVer arguments it passes to Consensus. Re-read the Issue description to understand the consequences this flag therefore has. For example:

          Praos.babbageProtVer =
            if npcExperimentalHardForksEnabled
              then ProtVer (natVersion @9) 0
              else ProtVer (natVersion @8) 0,

It is not obvious to me how these two flags interact in general. In particular, suppose we have an experimental era, such as Conway right now. Is it necessarily the case that ExperimentalProtocolsEnabled enables (at least) the NTN and NTC versions that Conway requires? And so we should require that ExperimentalProtocolsEnabled implies ExperimentalHardForksEnabled?

When we instead don't have an experimental era, ExperimentalHardForksEnabled has no effect and so that implication wouldn't be necessary? For example, without an experimental era, ExperimentalProtocolsEnabled could be guarding merely new queries.

The Obsolete Node Checks

The MaxMajorPV mechanism introduced here https://github.com/input-output-hk/cardano-ledger/commit/441e23bc7541ab0a535e9d24cda1d836c04b64ed is older than the HFC. At a high-level, the HFC renders MaxMajorPV unnecessary: if the node doesn't yet know about the Nth ledger era, then it won't be able to even parse a header in that era. However, because of "intra-era hard forks", it may be that MaxMajorPV is necessary even in the presence of the HFC.

It seems possible that we could ban intra-era hard forks in the future. Or maybe even implement them as duplicates in the HFC type-level list (but that would be inconvenient in terms of performance, and Issue https://github.com/input-output-hk/ouroboros-consensus/issues/328, eg). Either would let us remove the MaxMajorPV mechanism, and instead rely only on the HFC for this.

Type-level ledger era major version intervals

The ledger statically assigns intervals of the major version to each era. For example:

https://github.com/input-output-hk/cardano-ledger/blob/a90833107aa98f359dbb839991582b397c1c4bc8/eras/allegra/impl/src/Cardano/Ledger/Allegra/Era.hs#L20-L26

-- | The Allegra era
data AllegraEra c


instance Crypto c => Era (AllegraEra c) where
  type PreviousEra (AllegraEra c) = ShelleyEra c
  type EraCrypto (AllegraEra c) = c
  type ProtVerLow (AllegraEra c) = 3

Alonzo is different because it contains a so-called "intra-era hard fork" (ie a hard fork that does not change the ledger era, and so does not involve the HFC).

https://github.com/input-output-hk/cardano-ledger/blob/a90833107aa98f359dbb839991582b397c1c4bc8/eras/alonzo/impl/src/Cardano/Ledger/Alonzo/Era.hs#L21-L28

-- | The Alonzo era
data AlonzoEra c


instance Crypto c => Era (AlonzoEra c) where
  type EraCrypto (AlonzoEra c) = c
  type PreviousEra (AlonzoEra c) = MaryEra c
  type ProtVerLow (AlonzoEra c) = 5
  type ProtVerHigh (AlonzoEra c) = 6

If it weren't for the testing overrides and the experimental flags, I think all of the node's use of protocol versions would be determined by those ProtVerLow and ProtVerHigh instances---even intra-era hardforks, since ProvVerHigh captures those. Sounds appealing.

nfrisby avatar Aug 28 '23 15:08 nfrisby

My mental model, although I know very little about the ExperimentalHardForksEnabled (so I also feel the need to sync on that), is that consensus and network flags are independent in general. However there might be hard forks which require to set both flags, or just one of them. You're right that TestEnableDevelopmentNetworkProtocols can limit queries, which is relatively small misconfiguration (doesn't break node-to-node, but might prevent testing some its features), but in the future we might have new consensus algorithm which will require an entirely new network (mini-)protocol (e.g. ouroboros-peras or ouroboros-laios). There's yet another level of complication: it could happen that we will have an experimental network protocol change which we don't want to enable, while consensus also requires another experimental feature from the network. Do we need to worry about this before it happens?

coot avatar Aug 28 '23 16:08 coot

Do we need to worry about this before it happens?

I think the only risk of not considering this would be surprise delays in releases (which are bad, but not as bad as bugs eg).

nfrisby avatar Aug 28 '23 16:08 nfrisby

I think so too.

Right now Handshake assumes that all versions are linearly (/ totally) ordered. But we could relax it to be a partial order, which would allow things like:

graph TD;
  1'-->0;
  1-->0;
  2-->1;
  2-->1';  

where 1 and 1' are the non comparable two different experimental version. Consensus could peak it's own max network version, the same with network. Assume that 1 contains some consensus feature while 1' some network features, and we don't want to publish both (e.g. 2). Then the network team would pick 1 as it's max version, and consensus would peak 1' as its max version.

Our picking algorithm would need to pick the smallest sub-partial order which is closed under supremum. This way if one enables only consensus experimental features one would end with versions smaller or equal to 1; if one only enables network experimental features one would end with versions smaller or equal to 1' and if one picks both then all versions smaller or equal 2.

Handshake would do the same as now: pick the largest version which belongs to both own supported versions and the presented one.

coot avatar Aug 28 '23 17:08 coot