plutus icon indicating copy to clipboard operation
plutus copied to clipboard

Casing on values of built-in types

Open effectfully opened this issue 1 year ago • 7 comments

We really want to have faster Data processing, but as per #6225 this is quite hard. Matching on integers is very inefficient too as one needs to perform repeated calls to IfThenElse and EqualsInteger (the possibility of using SOPs for matching on integers to solve the issue was brought up here). Matching on built-in booleans is also inefficient, particularly from the size point of view.

I feel like we can kill all these birds with a single stone: allow case to work on values of built-in types.

Bool

Take booleans, we could allow

case b
  branchFalse
  branchTrue

where

  • b is a Bool
  • branchFalse is executed if b is False
  • branchTrue is executed if b is True

(I'm making branchFalse come before branchTrue not only because false corresponds to 0 and true corresponds to 1, but also because the false case is more likely to terminate execution, hence being pretty-printed an average program would be more readable with the throwing clause appearing first. But we can make the ordering the same as in IfThenElse instead if we feel strongly about it, I personally don't care much)

This is as close as it gets to extending the UPLC AST with a constructor for IfThenElse without actually doing that. It should also be easy to implement, since Case is currently defined as

    | Case !ann !(Term name uni fun ann) !(Vector (Term name uni fun ann))

and all we need is to teach the CEK machine how to extract Terms from a Vector (plus check its length) and how to handle them depending on the type tag of the given constant.

Note how we would be able to address everything from #6598 without introducing any new builtins. For example if we had

assertAndContinueOrTrace :: string -> bool -> a -> a

and wanted to write force (assertAndContinueOrTrace msg b (delay x)), we could instead simply write

case b
  (trace msg error)
  x

This AST is of roughly the same size, but no force-delay bookkeeping is required, just like the additional builtin. The latter code is concise and on-point.

Integer

For integers we'll allow

case i
  branch0
  branch1
  branch2
  ...
  branchN

reducing to branch_i where i is the scrutinee or failing if i is not in [0, N].

This alone will optimize Data processing, because one will be able to write (pseudocode)

let (i, args) = unConstrData d
in case i
  (applyToList branch0 args)
  (applyToList branch1 args)
  ...
  (applyToList branchN args)

This is clearly unideal, since we very much want to bypass the unConstrData and applyToList noise, but this is a good start and an improvement in the right direction.

Note that associating case branches with hardcoded integers (0, 1 ... n) isn't much of a limitation, since one can recover full generality by adding an elemIndex builtin:

-- Same as `elemIndex` in Haskell except the first argument is a function taking the length of the
-- given list and returning something -- it's only called when the given element isn't found and
-- will normally be either `\n -> n` or `\_ -> -1`.
elemIndex :: (Integer -> Integer) -> a -> [a] -> Integer

For example the following:

case (elemIndex id 42 [5, 42, 1])
  branch5
  branch42
  branch1
  branchElemNotFound

will evaluate to branch42.

Functioning case on integers with the addition of elemIndex will also allow us to have Deterministic universal almost-unique Plutus Constructors if we ever feel like it.

Data

Ultimately we want case to handle Data.Constr the same way that it handles Term.Constr. There's still the limitation discussed in the "What exactly breaks if we add the unsafe constrTermFromConstrData?" section of #5777, but that one we should be able to address via multi-lambdas/multi-applications (see #6225). Which will help SOPs as well I believe, since multi-lambdas should be faster than iterated lambdas, plus UPLC is untyped, so there's still a possibility of getting the same mismatched semantics issue with SOPs as with Data.

So in the end we want

case (Constr 1 [a, b, c])
  branch0
  (\[x, y, z] -> body1)
  branch2

to evaluate to

(\[x, y, z] -> body) [a, b, c]

which is a multi-lambda \[x, y, z] -> multi-applied to [a, b, c].

I.e. that case means (pseudocode)

let x = a
    y = b
    z = c
in body1

and branch selection with Data will work the same way as with SOPs: for the Constr i args scrutinee the ith branch gets picked, so given that i is 1 in the example above, we pick the 1th branch and drop the 0th and the 2th ones (we only have the 2th one to make the example a bit more rich, nothing would change if it wasn't there). Except SOPs don't use saturated lambdas and pattern matching on Data has to (for reasons discussed in https://github.com/IntersectMBO/plutus/issues/5777 and https://github.com/IntersectMBO/plutus/issues/6225).

Note that casing on Data is orthogonal to pattern matching builtins, since the latter allow one to handle all constructors of Data, while case over Data only handles the Constr constructor in a specific SOP-like way.

Lists and pairs

I'm not sure how we should handle those in case-expressions. Just replicate pattern matching builtins? Or allow to match against multiple patterns like in Haskell (e.g. for lists [], [x], [x, y], x:y:z:xs etc)? Do we then want to change Case to contain proper patterns rather than arbitrary terms?

All of that is to be figured out, but this is way less important than speeding up processing of Data, so it can definitely wait.

Elaborate case to builtin calls?

Why reimplement case for booleans if we already have IfThenElse? Can we just elaborate case to IfThenElse under the hood? I believe the answer is yes, but it's awkward: IfThenElse expects values while we'll need to pass terms in there, which is doable as we can simply wrap terms into values (using VDelay for example), but then we need to unwrap the result, plus we want all the BuiltinRuntime nonsense (that is required for builtins but not case) to get inlined away and persuading GHC is always non-trivial, so I feel like it's much easier to just reimplement a few lines of logic than attempt a clever general solution requiring an order of magnitude more effort to implement and maintain.

Iteration 2

Bool

A natural contender for casing over Bools is a native IfThenElse (as per #6578):

data Term name uni fun ann
    = <...>
    | IfThenElse !ann !(Term name uni fun ann) !(Term name uni fun ann) !(Term name uni fun ann)

It's very hard to accommodate a native IfThenElse in the current AST structure due to the latter being completely unaware of what built-in types are available, but how much are we losing by using Case instead? Case is

    | Case !ann !(Term name uni fun ann) !(Vector (Term name uni fun ann))

If my calculations are correct, IfThenElse is 7 words and Case with two Terms inside of its Vector is 16 words (plus whatever the underlying Array# needs, for which this is probably a relevant piece of info), which is a pretty large overhead.

However it doesn't have to be Vector there, we don't use the slicing functionality anyway, so if we replace it with Array/SmallArray, we'll get Case down to 9 words (plus whatever the underlying Array#/SmallArray# needs), which isn't too bad.

In short, it does suck that we introduce overhead, but this is still an order of magnitude less overhead than with the current IfThenElse builtin requiring the argument to be evaluated first, which in turn requires us to put delays in there, which in turn requires us to wrap the application in a force, which in turn requires us to create a frame on the stack in the CEK machine -- all of that is hella expensive.

And given that accommodating IfThenElse as an AST node is simply not gonna happen without a complete redesign of UPLC, let's not let perfect be the enemy of good.

Data

How do we generalize from Bool and Integer given that handling Data requires us to bind variables?

If we take the example from the above:

case (Constr 1 [a, b, c])
  branch0
  (\[x, y, z] -> body1)
  branch2

then the generalization amounts to introducing multi-lambdas and somehow evaluating the application of (\[x, y, z] -> body1) to [a, b, c] at runtime.

Evaluating the application directly seems quite tricky. We'd have to save _ [a, b, c] in a frame, go into the right branch, evaluate the multi-lambda to WHNF and pop the frame off of the stack. Which sounds straightforward until one realizes that saving the frame is stack manipulation that we have to somehow do within some type class that doesn't know anything about the CEK machine. We're designing a very general machinery that should work for all kinds of evaluators and universes, much like the builtins machinery. Whatever we end up implementing has to be expressible as a constraint on uni and be pluggable into any evaluator. CEK machine frame manipulation is way too specific for that.

So instead of evaluating the application directly we can do what we did with pattern matching builtins and simply construct the application from within that uni constraint to let the evaluator (whatever it is) choose how to handle it. It's a bit awkward because the function stored in the branch is a term and the argument is a value, but seems doable, plus we can always turn a value into a term if we don't like the awkwardness (but that'll require reevaluating the value).

But we haven't even begun to discuss how we're gonna handle typing and compilation from higher-level langs. How are we gonna construct multi-lambdas in Plinth for example? We can't just use regular lambdas, because even with optimizations turned off GHC still feels free to rearrange the code. The only reason why we care about multi-lambdas this much is because it gives us the exactness of application, but that's precisely what GHC may feel compelled to mess with.

Multi-lambdas do introduce some overhead though. If we had a version of Case binding variables directly

    | CaseBind !ann !(Term name uni fun ann) !(Vector ([name], Term name uni fun ann))

we wouldn't need to go into the branch to evaluate it to WHNF as the names are immediately available. Hence instead of creating an application from within the uni constraint, we'd create a multi-let.

This all doesn't cover one important use case though: we do not always want to give names to all the arguments of a Data constructor. Say we have a sum type and we're interested in the second field of its third constructor, it'd be wasteful to assign a name for each other field of that constructor, of which there can be many.

So what we can do instead of multi-lambdas or CaseBind is to use regular lambdas and move multi-let construction from the constraint on uni right into the AST. Our example becomes

case (Constr 1 [a, b, c])
  branch0
  (\xyz -> let <x, y, z> = xyz in body1)
  branch2

where let <x, y, z> = <...> in <...> is a made up syntax for the Let AST node:

    | Let !ann ![name] !(Term name uni fun ann) !(Term name uni fun ann)

This may (or may not) be slightly less efficient than the previously described designs, but it supports the case where we preprocess the list to remove redudant elements from it before binding the rest. It also generalizes well to pairs:

let <x, y> = (a, b) in <...>

As a bonus, we'll also be able to sneak in the usual let binding a single variable, as per #4797.

AST-level let will require yet another constraint on uni. Alternatively, we could introduce a Row type that the AST is aware of, specify that multi-let only works on Rows and add a function toRow converting lists to rows, which will make the branch from example look like this:

(\xyz -> let <x, y, z> = toRow xyz in body1)

But that's a bunch of pointless hoops to jump through, so I believe that for the sake of performance it's better to poke another builtins-related hole in the AST by allowing lets to operate on builtins, if we're gonna do that with case anyway.

Plan

This suggests this three-step plan:

  1. casing on Bools and Integers
  2. casing on Data with the list of the fields of the chosen constructor getting bound using a regular lambda
  3. multi-lets

I'm not yet convinced this refinement of the original plan is optimal, but it sounds most promising to me.

Iteration 3

We've now implemented casing on booleans, integers, unit, pairs, lists the way it's specified above, particularly using this approach for the last two:

So instead of evaluating the application directly we can do what we did with pattern matching builtins and simply construct the application from within that uni constraint to let the evaluator (whatever it is) choose how to handle it.

Now we're looking into casing on Data. I had two arguments against multi-lambdas, this was the first one:

But we haven't even begun to discuss how we're gonna handle typing and compilation from higher-level langs. How are we gonna construct multi-lambdas in Plinth for example? We can't just use regular lambdas, because even with optimizations turned off GHC still feels free to rearrange the code. The only reason why we care about multi-lambdas this much is because it gives us the exactness of application, but that's precisely what GHC may feel compelled to mess with.

I'm going to disregard it. Firstly, UPLC is far more important than Plinth and so the latter shouldn't constrain the former. Secondly, since #7271 we use Template Haskell for casing on Integer anyway, so we can always just add more Template Haskell stuff and that should be expressive enough to cover whatever needs we may have with multi-lambdas.

The second argument was

This all doesn't cover one important use case though: we do not always want to give names to all the arguments of a Data constructor. Say we have a sum type and we're interested in the second field of its third constructor, it'd be wasteful to assign a name for each other field of that constructor, of which there can be many.

and that one makes sense, but the answer to it isn't ditching multi-lambdas -- it's making them more expressive.

So we're back to having competing approaches:

  1. pattern-matching let (which we should probably name match)
  2. multi-lambdas + list processing performed via builtins
  3. irrefutably pattern-matching lambdas, which I'm gonna call match-lambdas for short

Here's an example of a match-lambda:

\(x : _ : xs) -> body

It's meant to do exactly what it does in Haskell: take a list, assign x to its head and xs to the tail of its tail, skipping the second argument.

With this we can rewrite our let-based example

case (Constr 1 [a, b, c])
  branch0
  (\xyz -> let <x, y, z> = xyz in body1)
  branch2

as

case (Constr 1 [a, b, c])
  branch0
  (\(x : y : z : []) -> body1)
  branch2

where the language of patterns handling pairs and lists (initially, we can add more stuff in future) is something like

data PatternSeparator
    = PatternComma
    | PatternCons

data Pattern
    = PatternWildcard
    | PatternName Name
    | PatternSeparator PatternSeparator Pattern Pattern
    | PatternNil

Now the challenge is balancing costing with efficiency. Such lambdas give us deep pattern matching and we probably don't want to do anything deeply without costing intermediate steps. It's likely not dangerous not to cost them, because patterns aren't computable dynamically anyway (maybe they should become dynamic some day though, given that it's natural for deep processing to be dynamic) and so the cost is going to be limited by and proportional to the size of the script, for which we already charge, but I'd rather do things properly and cost intermediate steps.

Costing intermediate steps means that either every part of a deep pattern match needs to happen within the CEK machine with a bunch of frame creation and processing or using the same idea that we use with builtins and emitting all costs at once but lazily. The latter is probably more efficient, but also harder and much more error-prone. Unless we implement it through the actual builtins machinery using some sort of Java's XPath-like eDSL, like we've discussed it a few times, in which case maybe multi-lambdas are the right answer, since we'll only need binding multiple arguments and all list preparation (like dropping unused elements) is gonna be deferred to builtins.

Why only irrefutable pattern matching and not full-scale pattern matching where alternatives are tried in order until a successful match is performed? Because Case is gonna pick the right branch, we don't need to iterate through various alternatives, plus designing such a pattern matching system would be a major undertaking, while we're only trying to get casing on Data work.

Which approach we're gonna pick and what flavor of it is gonna take time to discuss, however we can probably settle on an approach to casing on Data right now: whether we decide to use let or match-lambdas in the end, either way the branch of a case-Constr is just a function. So as long as we can agree that match-lambdas should get applied via regular function application, we can settle on case-Constr elaborating to a function application (which is exactly what we do for pairs and lists).

In other words, the uncertainty isn't around how to case on Data -- it's around how to process the list once a case-Constr is performed. So we may as well just implement casing on Data right now even before the hardfork in a way that is consistent with what we do with pairs and lists (and how it was originally planned in the Plan section above) -- and handle deep list processing afterwards.

All we need to agree on is that in

(\(x : _ : xs) -> body) [1, 2, 3]

the space between ) and [ is regular function application and not some separate not-yet-existent AST node.

Well, that and the fact that there's no other approaches to marrying Case and Data.

Summary

It seems like this path will allow us to iteratively improve the status quo w.r.t. performance and ultimately resolve the long-standing issue of Data processing being slow, which is among the most pressing issues with UPLC at the moment.

effectfully avatar Oct 28 '24 22:10 effectfully

Great this is exactly what I would like to make compilation simpler and more efficient.

MicroProofs avatar Oct 29 '24 15:10 MicroProofs

Very exciting stuff! It's nice to see more support around directly interacting with BuiltinData.

colll78 avatar Oct 30 '24 00:10 colll78

(i) Thanks for thinking along these lines. Obviously, additional instructions to speed up data processing are a good idea.

(ii) Since there are a finite number of built-in types, an alternative would be to add a new case for each: caseBool, caseInteger, caseData, etc. Presumably that would be slightly more efficient, because it will not be necessary at runtime to determine what kind of data is the motive of the case. (Here I use the following terminology: case motive branch_0 ... branch_n. Everyone uses branch, but motive is not so widely known.) Is there some reason for lumping them all into one instruction?

(iii) Your explanation of the meaning of case for Integer is incomplete. In case motive branch_0 ... branch_n what happens if the motive is less than zero or greater than n? Do you intend for branch_n to be a default?

(iv) Your explanation of the meaning of case for Data is incomplete. You gave only this example

 case (Constr 1 [a, b, c])
    branch0
    (\[x, y, z] -> body1)
    branch2

You don't explain the role of branch0 or branch2, nor how the term is supposed to behave in general.

wadler avatar Nov 09 '24 11:11 wadler

Since there are a finite number of built-in types, an alternative would be to add a new case for each: caseBool, caseInteger, caseData, etc. <...> Is there some reason for lumping them all into one instruction?

Yes:

  • we currently only have 4 bits to encode AST constructors, so going beyond that will either require complicated gymnastics with reserving, say, 1111 to mean "and now expect more bits for the extended set of AST constructors" -- which would be somewhat annoying to maintain and slower than necessary. Or it'll require us to create a new Plutus version with a completely different binary encoding and then we need to maintain two different binary encodings, so again annoying to maintain
  • if we add caseBool, caseInteger, caseData to the AST, we'll have to update all the code doing something with the AST, of which there's a lot: all transformation passes, all analysis passes (e.g. uniques, purity), serialization/deserialization, parsing/pretty-printing etc. This is all doable, but it'll further delay the change, for which users have been asking us for years already probably. Plus it'll again increase maintenance costs
  • adding all those functions as separate constructors is pretty much giving up on the original design of UPLC being an extensible but conservative untyped lambda calculus. Which is perhaps fine, but then we should do it consistently and give up on the current extensibility entirely, because it's weird to have caseBool in the AST and not booleans themselves

Presumably that would be slightly more efficient, because it will not be necessary at runtime to determine what kind of data is the motive of the case.

Slightly faster at runtime, slightly slower to decode, who knows if it'll pay out. But dispatching on type of the scrutinee is just a bit of constant overhead, the problem that we're addressing here is a ridiculous amount of existing overhead. So I don't think we need a perfect solution here, just one that solves the problem.

If it some point we decide that we want to hardcode builtins into the AST, we can do it one fell swoop and create an entire new UPLC, I don't personally see much value in breaking the current design just to save some constant costs here and there. We can get much more out of giving up on extensibility if we want to.

(iii) Your explanation of the meaning of case for Integer is incomplete. In case motive branch_0 ... branch_n what happens if the motive is less than zero or greater than n? Do you intend for branch_n to be a default?

My bad, sorry, I've now specified that the match should fail.

(iv) Your explanation of the meaning of case for Data is incomplete. You gave only this example

 case (Constr 1 [a, b, c])
    branch0
    (\[x, y, z] -> body1)
    branch2

You don't explain the role of branch0 or branch2, nor how the term is supposed to behave in general.

Made it this:

So in the end we want

case (Constr 1 [a, b, c])
  branch0
  (\[x, y, z] -> body1)
  branch2

to evaluate to

(\[x, y, z] -> body) [a, b, c]

which is a multi-lambda \[x, y, z] -> multi-applied to [a, b, c].

I.e. that case means (pseudocode)

let x = a
    y = b
    z = c
in body1

and branch selection with Data will work the same way as with SOPs: for the Constr i args scrutinee the ith branch gets picked, so given that i is 1 in the example above, we pick the 1th branch and drop the 0th and the 2th ones (we only have the 2th one to make the example a bit more rich, nothing would change if it wasn't there). Except SOPs don't use saturated lambdas and pattern matching on Data has to (for reasons discussed in #5777 and #6225).

Hope this is clearer. Some of that context is in the #5777 and #6225 and I didn't want to duplicate it here. Sorry for any confusion caused!


Just to nitpick:

(Here I use the following terminology: case motive branch_0 ... branch_n. Everyone uses branch, but motive is not so widely known.)

We normally refer to x in case x <...> as "scrutinee" in docs and papers, which seems to be common. A motive, as defined by McBride at least, is the type you're eliminating into, i.e. in our case the type of the body of each of the branches (which has to be the same for all branches).

effectfully avatar Nov 11 '24 14:11 effectfully

Thanks for the response.

The name "scrutinee" is specific and easier to understand than "motive"; good choice. You may be right that I've misunderstood the latter term.

Thanks for the clarification of why you are not defining separate operators for each sort of scrutinee. That should make its way into the design document. I agree with your reasoning, especially the point about not wanting to overuse the four bits available to encode AST constructors.

Thanks also for the clarification of the meaning of the various operators.

wadler avatar Nov 11 '24 20:11 wadler

This seems a very good low hanging fruit - at least the Bool part.

We don't necessarily need to enable casing on all builtin types, nor do we need to enable them together, right? If we can focus on making the Bool part work, that should already be very impactful and welcome.

I don't get why multi-lambda is needed to support Data. Why can't we have the following?

 case (Constr 1 [a, b, c])
    branch0
    (\x y z -> body1)
    branch2

And if we make Data work, there should be much less need to support Integer.

zliu41 avatar Nov 22 '24 00:11 zliu41

If my calculations are correct, IfThenElse is 7 words and Case with two Terms inside of its Vector is 16 words (plus whatever the underlying Array# needs, for which this is probably a relevant piece of info), which is a pretty large overhead.

Are you talking about the size of serialised scripts or the memory footprint of the CEK machine? It sounds like the latter, and if so I don't think it is even a slight concern.

There's still the limitation discussed in the "What exactly breaks if we add the unsafe constrTermFromConstrData?" section of https://github.com/IntersectMBO/plutus/issues/5777, but that one we should be able to address via multi-lambdas/multi-applications (see https://github.com/IntersectMBO/plutus/issues/6225).

I still don't understand why multi-lambda, not just single-lambda bound to a list (which is exactly what caseData and matchData do) - isn't that sufficient to address the unsafeness problem? (By the way, there are multiple issues and discussions on this topic, and I find it hard to tell whether any particular question has been discussed or addressed somewhere)

Whatever we end up implementing has to be expressible as a constraint on uni and be pluggable into any evaluator. CEK machine frame manipulation is way too specific for that.

So as I understand the whole purpose of multi-lambda is to ensure the length of [a, b, c] is the same as the length of [x, y, z]. Once this is verified, the evaluation can proceed in the same way as evaluating single lambdas. When you say "the constraint on uni", do you mean this condition? Or something entirely different?

We can't just use regular lambdas, because even with optimizations turned off GHC still feels free to rearrange the code.

This is not an issue with most other languages. For Plinth, I think we can wrap the regular lambda in an opaque type. Also, doesn't what you are proposing (using Let) have the same problem?

| Let !ann ![name] !(Term name uni fun ann)

Don't you need one more component? There are three components in let <x, y, z> = xyz in body1: <x, y, z>, xyz and body1.

zliu41 avatar Apr 08 '25 00:04 zliu41