plutus icon indicating copy to clipboard operation
plutus copied to clipboard

Redundant scope checking

Open effectfully opened this issue 2 months ago • 5 comments

We perform scope checking before evaluating a program on the chain to ensure that the program is well-scoped and doesn't contain free variables.

If scope checking fails, then this is a phase-2 validation error, i.e. exactly the same situation as if we let the program to get evaluated (which would produce a free variable error at runtime, unless the free variable happened not to be on the evaluation path).

I.e. the scope check saves us some work when it comes to handling incorrect scripts. But given that the node still distributes the incorrect program to other nodes and records it on the blockchain (since scope checking is a phase-2 check), it doesn't save us that much.

What does the scope check cost us on the other hand? A lot. Adding the scope check to validation-decode benchmarks measuring the cost of decoding scripts increases runtime by 25% (see this PR). I.e. 20% of the script preparation stage (decoding + version check + scope check) is wasted on the scope check (and 3% of total validation times for the full-validation benchmarks, see this PR removing the scope check).

I assume, this includes reference scripts as well, since the scope check isn't a part of deserialization, but rather something that we do right before evaluation.

According to @colll78:

the revenue of the top dApps (including Minswap, SundaeSwap and Liqwid) was nearly cut in half by the introduction of the cost per reference script bytes parameter.

I.e. the useless scope check is responsible for squeezing 10% (20% * 50%) out of margins of businesses running on the Cardano blockchain. This is absolutely ridiculous in my opinion.

How did we add this check in the first place? By accident, to quote @michaelpj:

The only reason we have this behaviour is because we had the de Bruijn conversion pass in there which incidentally enforced this property. It wasn't a conscious choice.

So how come it's still there?

Originally, it wasn't removed because there was an idea that maybe doing scope checking once before evaluation is cheaper than performing it during evaluation over and over again. This idea was wrong, I investigated it (resulting in this massive performance improvement as a side effect) and I strongly believe that performing the scope check upfront does not and cannot save us anything at runtime.

The GHC Core for the new code handling Var in the CEK machine is available here -- check for yourself and let me know if you see how assuming that all variables are in-scope can save us anything in there.

This seems fairly uncontroversial, so again, how come the scope check is still there?

Well, our Agda formalization, plutus-metatheory relies on it being there, quoting @jmchapman:

The current semantics of the language is for closed terms. So, moving to open terms would lead to undefined behaviour. Weakening this guarantee would make it easier to make a mistake in the implementation of the CEK machine.

My arguments to that are:

  1. There's no way we could ignore handling the free variable case in the Haskell evaluator. It does not operate on intrinsically-scoped terms, we have to somehow handle a free variable in the implementation regardless of how sure we are that this code path can never be triggered. Consequently we have to specify and verify that case simply because it's in there.
  2. The implementation of the CEK machine does not consult the formalization in any way, we do not track what the formalization is up to in order to adapt the implementation accordingly. I've always assumed the flow would be the opposite: the formalization builds a model of what the implementation does, since it's the latter that is constrained by the real world requirements, not the former. 10% of revenue is hella a lot to pay so that the metatheory can pretend that it formalizes the real thing.
  3. The current approach is this: instead of formalizing the implementation of the CEK machine we assume that certain clauses in it cannot happen due to a check performed in plutus-ledger-api, which makes the implementation of the CEK machine not self-contained from the point of view of the metatheory. And there's such a distance to travel from plutus-ledger-api to untyped-plutus-core to index-envs to conclude it's all fine together, just not individually. Instead, the metatheory could just formalize what's in the implementation of the CEK machine, it's perfectly sound on its own.
  4. The implementation of variable lookup is quite non-trivial and it's hard to convince oneself that it's correct. And it's used on the critical path: this is how we do variable lookup in the CEK machine. So how come it's not formalized while being critical, non-trivial and perfectly formalizable? Well, the metatheory does a different thing on the basis of there existing the scope check pass somewhere upstream, this creates divergence between the formalization and the implementation, and the entirety of variable lookup logic gets sucked up into that divergence.

My conclusion is that not only is the scope check very expensive while being literally useless for on-chain validation, it also makes the implementation less safe, because we assume safety where there's in fact a huge unformalized gap between the implementation and the formalization.

We should just remove the scope check.

effectfully avatar Sep 25 '25 22:09 effectfully

It’s reallyhard to argue for keeping it. Maybe the path forward is to remove the scope check now to stop bleeding revenue, then prioritize formalizing the actual variable lookup logic? That way we get the performance win immediately and improve our verification coverage in the long run.

maureenblack avatar Sep 25 '25 23:09 maureenblack

Hi Roman,

This is not 'the implementation' vs agda, I did not mention agda in the above quote. This is a proposed change to the design, and specifically the semantics of the language. The proposal should be accompanied by an update to the specs (where the design is defined).

We must have a clear and practical design/specification for plutus which covers the syntax and semantics of the language. This should be followed and informed by implementations that all behave the same to ensure compatibility in the ecosystem and uninterrupted service and integrity of the network. The design is implemented by the haskell node, amaru (rust) node, layer 2 solutions, PL tooling for Plinth, Aiken, etc, and formal verification tooling.

There should not be a gap between the specification/design and the haskell implementation. This must be fixed! Maintenance of these are the responsibility of your team and the resources it has access to. It's not a one way street. Feedback from an implementation should suggest changes to the spec/design which will evolve and any design change implemented in a key implementation should be accompanied by a spec change to ensure compatibility, otherwise they should not go ahead.

I said earlier:

The current semantics of the language is for closed terms. So, moving to open terms would lead to undefined behaviour. Weakening this guarantee would make it easier to make a mistake in the implementation of the CEK machine.

The most important thing here is that this a design change - to change it you need to actually change the design not just one implementation. The change should be analysed, tradeoffs considered, and affected parties consulted.

i.e., you (the team) need to propose a change to the spec (both latex and agda, hopefully these will converge in future) and get it properly approved and agreed.

You cannot just change one single implementation and leave it at that.

jmchapman avatar Sep 26 '25 09:09 jmchapman

@jmchapman is right that in the future we can no longer do such things unilaterally (and maybe we already can't), due to other node implementations. I don't know if any of them is close to feature complete, but the node diversity initiative is certainly well under way.

That means any change that requires a HF should start with a CIP. And perhaps even some changes that don't require a HF. Once the CIP is merged, we then update the spec or blueprint or whatever the source of truth for other implementations is, along with the Haskell implementation.

This is especially relevant in this particular instance, because @effectfully is proposing to change the implementation, and leave the metatheory to be updated later. This (at least theoretically) creates the possibility of making mistakes, so even though @effectfully himself is fully convinced that this is Ok, it will be good to circulate this wider and get more opinions. Even if this is indeed Ok to do, other people may have good suggestions on ways to update the spec/metatheory that we can take.

For the next intra era HF, we are already releasing many features (array, value, droplist, msm, enabling all builtins in V1/V2, casing), and it's already quite a burden on us to ensure all of these are done properly. So I wouldn't want to push too hard to get the scope check change in. But we can still get things started. So in summary, we should start by writing a CIP.

zliu41 avatar Sep 27 '25 01:09 zliu41

No other nodes are even remotely close to block production. If anything, now is the time to do a deep review into the spec, and make these changes, because in a year and a half that won't be the case anymore.

Also, it is important to note that as of now, no alternative node / alternative CEK implementation refers to the spec as the source of truth, they unilaterally refer to the Haskell implementation as the source of truth. It is possible that this changes in the next few years as those nodes reach production but as of now that's how things are.

colll78 avatar Oct 02 '25 19:10 colll78

An alternative idea that will keep scope checking but make it way cheaper than it is right now:

Fuse the scope checking inside the flat deserialisation pass. I had this idea a while ago, but kept from implementing because flat was an outside dependency. Now that we have forked flat this will become much easier.

bezirg avatar Nov 18 '25 08:11 bezirg