Problematic cost model serialization
The current cost model serialization scheme is problematic: https://github.com/input-output-hk/cardano-ledger/blob/2505b7f103c78ee5a230b3e302e31d9607fffbd5/eras/babbage/test-suite/cddl-files/babbage.cddl#L324-L327
The are two problems:
- We cannot add a
V3cost model in the same protocol parameter update that initiates a hard fork. We can only count on nodes to be updated after a hard fork (chainChecks provides this fantastic guarantee), and so nodes that have not updated would not be able to deserialize aV3cost model. This is operationally annoying for us, and also means we have to wait an extra epoch to use new versions of Plutus. - We cannot add new fields to existing cost models.
We need a more flexible serialization scheme that address these two problems. Note that the ledger deserializes the CBOR list of integers into a map by using keys provided to us by the Plutus library (where the list is assumed to be ordered corresponding to the alphabetical sorting of the key).
I talked with @Soupstraw , @bezirg, and @michaelpj, we made this plan:
The Plan
-
The Plutus function
mkEvaluationContextwill be changed to take[Integer]instead ofMap Text Integer(matching the wire spec)- If
mkEvaluationContextreceives fewer integers than it needs (for a given version of Plutus), it will return an error message - If
mkEvaluationContextreceives at least as many integers as it needs, it will return an evaluation context, possibly with a warning that too many integers were supplied and ignored
- If
-
The
CostModelserialization will be made more permissive.- It will accept any
[int] - It will store whatever
[int]given to it, and also the results ofmkEvaluationContext- If it is for a version of Plutus that the ledger is unaware of, it will not call
mkEvaluationContextbut instead store an appropriate error
- If it is for a version of Plutus that the ledger is unaware of, it will not call
- Note that we need to figure out a way to become more permissive in the cost models serialization scheme only at a hard fork boundary, in order to not risk splitting the network. Cost models can currently only be changed by way of the governance mechanism, however, so if we trust that all V1 cost models updates prior to a hard fork have exactly 166 integers (and 175 integers for V2), we could just relax the scheme to
[int](at the risk of an operational mistake by the governance holders). In the past, we have handled such a maneuver by replacing at type with a type family, but that tends to be a fair amount of work and complicates the code. Perhaps there are other ideas?
- It will accept any
How this works, new language built-ins
Suppose version x of the node supports Plutus V2 cost models with 10 fields. Suppose version x+1 of the node supports Plutus V2 cost models with 11 fields. Let f be the new field, let m be the current major protocol version, and suppose that Plutus V2 will not support f until m+1.
During major protocol m:
- Node
xsees a cost model with 11 fields, it makes anEvaluationContextwith the 10 fields it knows about, but also stores a warning and the original 11 fields.- It happily evaluates all V2 scripts, and Plutus will gracefully fail (phase 2) if it sees
f(it will fail to deserialize).
- It happily evaluates all V2 scripts, and Plutus will gracefully fail (phase 2) if it sees
- Node
x+1sees a cost model with 11 fields, it makes anEvaluationContextwith all of them.- It also happily evaluates all V2 scripts, and Plutus will gracefully fail (phase 2) if it sees
f(based onm).
- It also happily evaluates all V2 scripts, and Plutus will gracefully fail (phase 2) if it sees
- Node
xshuts down, the operator updates the software, and comes back up with a nodex+1. Upon re-serialization of the ledger state, the operator is now in the case above (a normalx+1node).
During major protocol m+1:
- Node
xcan no longer participate in the chain, due to theObsoleteNodeerror. - Node
x+1is ready to process scripts withfin V2.
How this works, new Plutus versions
Suppose version x of the node does not know anything about Plutus V3. Suppose version x+1 of the node adds support for Plutus V3. Let m be the current major protocol version, and suppose that Plutus V3 is introduced at m+1.
During major protocol m:
- Node
xsees a cost model for V3. It stores the cost model integers, but in place of anEvaluationContextit stores an error.- Any V3 script will be rejected since
xdoes not know how to deserialize V3.
- Any V3 script will be rejected since
- Node
x+1sees a cost model for V3, it makes anEvaluationContextfor it.- Any V3 script will be rejected by the Plutus evaluator based on
m.
- Any V3 script will be rejected by the Plutus evaluator based on
- Node
xshuts down, the operator updates the software, and comes back up with a nodex+1. Upon re-serialization of the ledger state, the operator is now in the case above (a normalx+1node).
It will store whatever [int] given to it, and also the results of mkEvaluationContext
... which might be an error. Do we need to store the underlying [Int]? I guess it's needed for re-serializing.
the ledger gives the Plutus evaluator m, and Plutus knows f isn't okay during m
This only applies for node x+1. Node x has an old Plutus which doesn't know about f at all, but that's also fine, it'll be a deserialization failure that happens in the same place.
Any V3 script will be rejected for not having a cost model.
Won't it be rejected before that for simply not being recognized as a known language by node x?
Any V3 script will be rejected by the Plutus evaluator based on m
This is currently not happening: we should change this. We're currently relying on the absence of the cost model to disable a version.
Although in the scenario you mention, node x+1 will see a cost model for V3 in an update proposal, but it won't actually be applied until the HF that introduces V3, right? So it is still fine... but I'd be more comfortable with an explicit guard on our side also.
Any V3 script will be rejected for not having a cost model.
Won't it be rejected before that for simply not being recognized as a known language by node
x?
I agree with @michaelpj on this. The ledger calls into the plutus-ledger-api in a static way ; a specific node version can call into a fixed number of statically linked plutus versions, e.g. V1 and V2. If a newer currently uknown language version comes in e.g. V3, the node running the old software will not be statically linked with any V3.
Any V3 script will be rejected by the Plutus evaluator based on m
This is currently not happening: we should change this. We're currently relying on the absence of the cost model to disable a version.
I think we can enforce this on plutus-master by having a simple extra check inside evaluateScriptRestricting/Counting:
when passedProtocolVersion < expectedNextMajorHardForkIntroducingThisVersion
fail("in phase2")
Although in the scenario you mention, node
x+1will see a cost model for V3 in an update proposal, but it won't actually be applied until the HF that introduces V3, right? So it is still fine... but I'd be more comfortable with an explicit guard on our side also.
Yes, an extra guard on our side can be helpful in case of mismatch of plutus code version with ledger code version
@JaredCorduan your logic makes sense.
- Note that we need to figure out a way to become more permissive in the cost models serialization scheme only at a hard fork boundary, in order to not risk splitting the network. Cost models can currently only be changed by way of the governance mechanism, however, so if we trust that all V1 cost models updates prior to a hard fork have exactly 166 integers (and 175 integers for V2), we could just relax the scheme to
[int](at the risk of an operational mistake by the governance holders). In the past, we have handled such a maneuver by replacing at type with a type family, but that tends to be a fair amount of work and complicates the code. Perhaps there are other ideas?
The problem and this thing with the type families I don't get at all, so no ideas from my side.
@JaredCorduan If we add a check on our side to reject language versions that shouldn't be enabled in particular protocol versions, we have a choice about whether we include it in isScriptWellFormed or not. If we do, this will be a phase 1 failure, if we don't it will be a phase 2 failure. I think we should try and make as many things as possible phase 1 failures, so I'm inclined to do it unless you disagree?
@michaelpj
a) if we make it phase1 error, we preclude submitting+executing any new-language scrips until HF. b) if we make it phase2 error, we allow submitting new-language scripts but disallow executing them until HF.
I don't know if there is a use-case to do (b).
I guess it maybe prevents you from submitting a reference script with a new language version before the HF? Does the ledger deserialization refuse to deserialize transactions containing newer scripts from newer languages before the HF?
Do we need to store the underlying [Int]? I guess it's needed for re-serializing.
exactly
the ledger gives the Plutus evaluator m, and Plutus knows f isn't okay during m
This only applies for node x+1. Node x has an old Plutus which doesn't know about f at all, but that's also fine, it'll be a deserialization failure that happens in the same place.
good catch, thank you! I will edit The Plan.
Won't it be rejected before that for simply not being recognized as a known language by node x?
yes, another good catch! I'll edit that as well.
This is currently not happening: we should change this. We're currently relying on the absence of the cost model to disable a version. Although in the scenario you mention, node x+1 will see a cost model for V3 in an update proposal, but it won't actually be applied until the HF that introduces V3, right? So it is still fine... but I'd be more comfortable with an explicit guard on our side also.
I also much prefer if Plutus could gaurd this for us as well, otherwise it depends on the timing of the updates (ie the governance has to remember that they must never update the new cost model an epoch before the HF vote, etc).
The problem and this thing with the type families I don't get at all, so no ideas from my side.
yea, no worries, this is a problem on the ledger side
Does the ledger deserialization refuse to deserialize transactions containing newer scripts from newer languages before the HF?
That would be great. If we want to allow folks to create reference scripts in advance for unreleased languages, we would need a similar scheme for transaction outputs as we've just made for the cost models. It's the same problem: the current serialization scheme for transaction outputs does not allow for new languages in reference scripts, and we can only guarantee that folks have the new software after a hard fork. Having the isScriptWellFormed check the plutus version against the protocol version provides us a clean transition.
Let me leave future Jared some proof that the current serialization scheme is not flexible:
Babbage uses the same script type as alonzo: https://github.com/input-output-hk/cardano-ledger/blob/c32acb6e90ed89c58ef411baead4af554518ff0b/eras/babbage/impl/src/Cardano/Ledger/Babbage.hs#L204
Reference scripts are just deserialized as cbor-in-cbor scripts: https://github.com/input-output-hk/cardano-ledger/blob/c32acb6e90ed89c58ef411baead4af554518ff0b/eras/babbage/impl/src/Cardano/Ledger/Babbage/TxBody.hs#L693
Alonzo script type deserialization is not flexible wrt languages: https://github.com/input-output-hk/cardano-ledger/blob/c32acb6e90ed89c58ef411baead4af554518ff0b/eras/alonzo/impl/src/Cardano/Ledger/Alonzo/Scripts.hs#L391-L401
I think we should try and make as many things as possible phase 1 failures
Absolutely. We can add this to our not-yet-existent list of guiding principles. (it's not dogma, everything is still open for discussion, though).
If we want to allow folks to create reference scripts in advance for unreleased languages
I think I would actively prefer it if they can't :D
If we want to allow folks to create reference scripts in advance for unreleased languages
I think I would actively prefer it if they can't :D
That's definitely my preference as well, I was just trying to tease out the implications (a weak form of a proof by contradiction :) )
the last piece of the puzzle here will be solved by #3014 :raised_hands: (namely how to gracefully change the serialization at a hardfork boundary)
:rofl: this was accidentally and automatically resolved by me saying:
In order to resolve #2902, we still need to
@JaredCorduan AI at its finest. Joking, it was a regex "AI"
:tada:
@lehins and I had a discussion on this, and the conclusion is that there are a few problems with the "How this works, new language built-ins" part, and it doesn't currently work:
- A transaction proposing to update the number of fields to 11 does not deserialize due to the use of
decodeCostModelFailHard. - "Node x sees a cost model with 11 fields, it makes an EvaluationContext with the 10 fields it knows about, but also stores a warning and the original 11 fields" does not appear to be implemented correctly.
- Most importantly,
mkEvaluationContextfails upon receiving fewer parameters than expected. At the beginning of protocol versionm, there are only 10 parameters, and it only becomes 11 later. But nodex+1always expects at least 11. How can nodex+1validate the chain then?
cc @michaelpj @bezirg
Most importantly, mkEvaluationContext fails upon receiving fewer parameters than expected. At the beginning of protocol version m, there are only 10 parameters, and it only becomes 11 later.
So, what we have discussed in this context is I think:
- We end up in the situation where the major protocol version says that e.g. PlutusV3 is allowed, but we do not yet have a cost model for it. We need this to be fine.
- The only sensible behaviour in this situation is for PlutusV3 scripts to fail at evaluation time. So in the interval between the protocol version change and the cost model being installed, you just can't run any PlutusV3 scripts, which seems okay.
-
mkEvaluationContexttherefore needs to do the following when given too few parameters:- Not fail at construction time
- Fail when evaluating any script
Does that sound about right?
Also it would be super if we could somehow set up a test for this. I don't know if the ledger has cross-hard-fork tests?
So, what we have discussed in this context is I think:
I think your first two points are related to the case where we add a new Plutus version, but the problem here is with adding new builtins to an existing Plutus version.
mkEvaluationContext therefore needs to do the following when given too few parameters: Not fail at construction time
The solution outlined above says: "If mkEvaluationContext receives fewer integers than it needs (for a given version of Plutus), it will return an error message". And this has always been how it works.
If we change mkEvaluationContext to not fail when given too few parameters, then a problem (that @lehins brought up in the discussion) is that this would allow someone to submit a proposal reducing the number of cost model parameters, which cannot be allowed. But perhaps we can rely on the committee/DReps to reject such proposals? Either way, there seems to be a lot of work to be done to make this actually work.
Fail when evaluating any script
We'll also need to make sure there's no significant performance overhead if we need to check the condition at script evaluation time.
I think your first two points are related to the case where we add a new Plutus version, but the problem here is with adding new builtins to an existing Plutus version.
Right, so it is more subtle. Perhaps the tightest solution is to fail iff we are missing parameters that are needed for a builtin in the script at hand. But that's tricky. One way we could do it would be to set any missing parameters to MAX_VALUE, so that any attempt to use them will just blow out the budget?
Ok, we might have a solution for this, which would require minimal amount of work.
The most important part of the solution is for mkEvaluationContext to never fail if it receives 233 or more parameters, regardless of the protocol version that the node is running in.
This would mean that plutus script execution could receive ExecutionContext that was build from 233 parameters at any point, regardless of presence of new primitive or the protocol version that the node is running in. That is because anyone can propose and potentially enact a CostModel update with 233 parameters for PlutusV3, even after the intra-era hardfork that will introduce new primitives.
I also suggested we implement an optional primitive like that right now, which would work if we had 234 parameters in PV9. This would actually allow us to write some tests for this behavior before we hard fork into Conway.
By the way, The Plan above says:
It happily evaluates all V2 scripts, and Plutus will gracefully fail (phase 2) if it sees f (it will fail to deserialize).
It also happily evaluates all V2 scripts, and Plutus will gracefully fail (phase 2) if it sees f (based on m).
I believe both are phase-1 failures, not phase-2.
I believe both are phase-1 failures, not phase-2.
Yes, that will be either MalformedScriptWitnesses or MalformedReferenceScripts phase 1 validation failure, depending on where in the transaction the script is.
In both cases that will have to be plutus script deserialization failure