plutus
plutus copied to clipboard
Why inputs for TX are treated as Set?
Summary
If inputs are set it is impossible to rely on order of inputs. this forces us to do redundant filter-like searches instead of just pattern matching on list of inputs.
In the case of outputs it's convemient that outputs are list and we can do
fooOut : barOut : bazOut : _ = txInfoOutputs info
Instead of more resource heavy filtering:
fooOut = head $ filter fooPred (txInfoOutputs info)
barOut = head $ filter barPred (txInfoOutputs info)
bazOut = head $ filter bazPred (txInfoOutputs info)
but because inputs form not a list but set it can not be done for inputs. This leads to redundant filtering.
Steps to reproduce the behavior
Actual Result
I'd like to be able to pattern match on list of inputs inside validator and rely on order. It's true that spend inputs are unpredictable but we can move all arbitrary user inputs to the end of the list and have easy to use access to inputs that spend scripts and which are useful for checks inside validator.
Expected Result
Describe the approach you would take to fix this
No response
System info
I don't know what you're referring to. Inputs are a list in TxInfo
. If you're referring to the fact that inputs in transactions themselves are a set, then you need to open an issue with the ledger, which I think is highly unlikely to change.
Yeah, they are list in the TxInfo but they go over internal set...
I think what you are saying is that you cannot predict the ordering of the list, because it's based on the set in the real transaction.
That's right, this leads to some redundant checks or filter searches. That we could avoid if we were sure that order will be the same in validator as in the TX.
I think what you are saying is that you cannot predict the ordering of the list, because it's based on the set in the real transaction.
So the inputs aren't sorted or anything like that? The order just matches the order in the transaction, which is determined by the submitter I assume?
No, it does not match the order in the original serialization, but rather the internal sort order used by the ledger.
Do you have any tips for where to look for this sort order? is it defined in https://github.com/input-output-hk/cardano-ledger or do I have to look through the node implementation to figure it out?
@L-as the ordering is the lexicographic ordering on pairs (TransactionID, TransactionIndex)
. @michaelpj do you know if there is a place in our documentation where it would be helpful to include this detail?
I guess it would be in the spec?
No, it does not match the order in the original serialization, but rather the internal sort order used by the ledger.
Sounds like this is the expected behavior of Plutus then.
@michaelpj however given the confusion, maybe it's worth clarifying the alignment in Plutus docs too? Or was it already done at some point?
I think this should be in the part of the ledger spec that explains how the script context is converted into Data, which doesn't currently exist.
I think this should be in the part of the ledger spec that explains how the script context is converted into Data, which doesn't currently exist.
OK, I've opened a ticket on the ledger side and I'm closing this one.
If we specify (as is currently true) that the transaction inputs in the script context are in lex order, then we are essentially promising that plutus scripts can depend on this. ie that we will never change it. with the advantage of make scripts more efficient. And a bug in the ledger code (breaking this promises, accidentally) would lead to problems in Plutus scripts.
If we make it clear that we make no promises, then we give ourselves flexibility. at the cost of this filtering.
In general, it's unclear to me where to draw the line on what ledger internals we should let Plutus scripts depend on.
Well, we should specify what happens, because it's part of the observable behaviour of the chain. You could not write a competing node without reproducing this behaviour, therefore it should be specified. I don't think we have the liberty to declare that it's undefined behaviour.
In this instance, I think it is fair, but in general, are we promising that everything that the ledger does for the script context is a promise forever? Ie the code is the true spec? I guess if, say, another implementation sent the inputs in a different order, we can never control what assumptions scripts might make, and such a divergence would lead to problems (even if the spec itself made it clear that it was undefined...).
FWIW scripts rely on this behaviour to optimise operations on maps.
In this instance, I think it is fair, but in general, are we promising that everything that the ledger does for the script context is a promise forever? Ie the code is the true spec?
Yes, we are. The script context object is observable by scripts, and so any changes to it can affect at least e.g. the budget and therefore whether or not scripts validate. This is why it's such a problem that we haven't specified it properly.
Anytime we've discussed a change and noticed that it would be observable by scripts, we've gone out of our way to do something different which was not observable, even when is seemed extremely unlikely to ever manifest, so I'm absolutely sympathetic to the concern.
But promising that we will never make a change observable by scripts is very daunting. It is daunting for two reasons:
- The ledger is comprised of many lines of code, what human could say with the utmost confidence that they could list out every observable fact about the script context? How would anyone ever know if they've exhausted the list? I do think I've got most (nearly all?) of it in my head, Peter and I came up with a pretty good list, but there will always be a lingering doubt that we've missed something.
- We are in the Rumsfeld "unknown unknowns" territory, making promises about a future that is hard to predict while ruling out future possibilities. (I do realize that we can have new assumptions in new versions of Plutus, but it can still be extremely difficult to maintain invariants for the existing languages that have no expiration).
The alternative that I see is this: write down an explicit list of assumptions that script authors can assume about the script context, write property tests for this list, and declare everything not on the list as undefined behavior that should not be relied upon.
The alternative that I see is this: write down an explicit list of assumptions that script authors can assume about the script context, write property tests for this list, and declare everything not on the list as undefined behavior that should not be relied upon.
I think the alternative is to fully specify the creation of the script context. Note that doesn't mean specifying everything about the ledger, just how the script context is created. Since it mostly re-presents information from the transaction, I don't expect that to be so difficult, apart from the odd thing like ordering.
I also just cannot see how we can ever say anything is "undefined behaviour". If behaviour is observable by scripts, and there are scripts in the history of the chain that depend on that to validate, then if we change that then we break history. It's just like everything else.
I think the alternative is to fully specify the creation of the script context. Note that doesn't mean specifying everything about the ledger, just how the script context is created. Since it mostly re-presents information from the transaction, I don't expect that to be so difficult, apart from the odd thing like ordering.
Maybe it's not so bad, and my time is better spent attempting the task than chatting :laughing:
I also just cannot see how we can ever say anything is "undefined behaviour". If behaviour is observable by scripts, and there are scripts in the history of the chain that depend on that to validate, then if we change that then we break history. It's just like everything else.
My concern is not about breaking history, indeed we cannot do that. I am worried about what we can and cannot change with a hard fork.
Maybe it's not so bad, and my time is better spent attempting the task than chatting
FWIW Matthias is already annoyed that it's not specified, and I basically promised that I'd specify it for the new SOP version: https://github.com/cardano-foundation/CIPs/pull/455#issuecomment-1411727761
Let's talk about how to do the work best, but I think the Plutus team can chip in here.
I am worried about what we can and cannot change with a hard fork.
Assuming we keep the old behaviour for old scripts then we can change whatever we like! This is covered by the ledger-script interface part of CIP-35, and I think it's fine to do in a new Plutus ledger language. We could do it on a protocol version change, but given that we need to retain the old behaviour anyway, I think we should generally do the usual thing and do it with a new Plutus ledger language.
Assuming we keep the old behaviour for old scripts then we can change whatever we like!
you make it sound so easy :lolsob:
We are brothers in backwards-compatibility-suffering :joy:
Assuming we keep the old behaviour for old scripts then we can change whatever we like!
Imagine we have features a
, b
, and c
in ledger that make their way into the script context. there are all kinds of properties P(a, b, c)
that might hold for Plutus VX scripts. Imagine now adding feature d
to the ledger. even if d
is not used anywhere in some Plutus VX script, P(a, b, c)
could fail in the presence of d
in the ledger if we do not go out of our way to preserve it.
If we promise to maintain all the current observable facts, then we are promising: "for any idea d
that we every come up with in the future, and any property P(a, b, c)
of the current ledger, P(a, b, c)
will still hold with the introduction of d
. (I should probably use some notation like a(d)
or something that makes a
now depend on d
or something...).
Of course I'm happy to make promises about all the P
s that I can think of (assuming they are not too ridiculous), but promises about the ones I've yet to think of are worrisome.
On the one hand, I can hear:
yeesh, Jared, you are making this way more complicated than it needs to be, it's just a simple translation.
But on the other hand, it sounds like exactly the kind of thing that usually bites us.
And I think what you are proposing is something like a "reference implementation" of the script context creation, so that all the P
s become implicit (reducing the infinite to the finite). But even then I worry that this could be too much of a simplification of the real story, and that tendrils of the provenance will find their way all the way back to the wire...
I think we should get together and figure out what we can and can't promise for the script context. Here are some examples of P
s where weird things happen:
-
We want to remove pointer addresses by turning them into enterprise addresses, but we still have to support them because some script might rely on them. But now we've changed semantics: Let's say a script allows minting some tokens once per epoch proportional to the amount of Ada delegated to some pool. If a pointer is allowed here, this change has now changed the semantics: you'll get the tokens, but the pool owner doesn't get your stake.
-
Let's say we didn't need to keep track of registered reward accounts anymore, and we decide to get rid of the certificates. Well, now some script contexts can't be constructed anymore, so maybe we can't do that and still have to be able to make them, but just as dummies. But now there are some Plutus scripts that might have more steps that they could take, since they can track parts the state of this registry internally. So, if we want to preserve the semantics, we have to keep the current system in place forever.
I think the introduction of new features might be less problematic (we can always disallow old Plutus versions in scripts with new features), but I think that if we wanted to make the strongest possible promise here large parts of the ledger would be completely unchangeable forever.
Ah, I think I see. Let me make it concrete: let's talk about the sorting order of the inputs. Suppose that they are initially in using ordering O1, and we want to make a change to O2.
I think:
- We cannot say "haha, the ordering is unspecified, we're just going to change it silently and break literally everything"
- We can say "we're going to change the ordering at a particular point using our usual mechanism for backwards compatibility"
- We don't have to say "we'll maintain that particular ordering forever"
Let's say a script allows minting some tokens once per epoch proportional to the amount of Ada delegated to some pool
Scripts can't tell how much Ada is delegated to a pool, but I take your point. I didn't quite follow the other example but I agree it would be good to get together and think it through.
Let me make it concrete: let's talk about the sorting order of the inputs. Suppose that they are initially in using ordering O1, and we want to make a change to O2.
I definitely appreciate you making a concrete example. but just to be clear to everyone following along, I do think that the ordering of the transaction inputs in the script context is a property that we could make about the V1 and V2 context for eternity. Unfortunately I don't have a better example in mind atm (hence the unknown unknowns).
Was it an intentional decision somewhere to sort transactions in this way? I.e. is there a good reason why this was done, or is it just an accident of the datastructures used?
Last I dug into it, there was an explicit step when building the script context to sort these inputs (though I'm no good at navigating haskell codebases, so I can't find it again), so it didn't feel like it was an artifact of "this is how the ledger stores things internally".
For reference, this sort order has caused some consternation for at least two of the major DEX's on Cardano (SundaeSwap / MinSwap), and is limiting throughput. If you try to batch transactions, and do so in a way that preserves the on-chain order, this (quite surprising) sorting destroys that.
The only solution is to add an extra field in the redeemer and sort them in the script (costly), or to segment the batching into monotonic subsequences, which hurts protocol throughput.
I understand that the ship has sailed for PlutusV1/V2, likely (as, beyond even just execution units, the actual outcome of the transaction depends on this sort order now); but it'd be good to document somewhere not just that this is the behavior, but why: i.e. an accident of history, or xyz good reason.
That way when designing future Plutus versions, we can either revisit that decision, or continue to intentionally benefit from xyz good reason.
This is mostly an accident caused by a design decision made for Shelley (order didn't matter, so we put things in a set instead of a list). However, I still maintain that it's actually better for the ecosystem this way. There has been a bunch of discussion here: https://github.com/cardano-foundation/CIPs/pull/231
Last I dug into it, there was an explicit step when building the script context to sort these inputs (though I'm no good at navigating haskell codebases, so I can't find it again), so it didn't feel like it was an artifact of "this is how the ledger stores things internally".
It actually is how the ledger stores things internally.
There are a few things going on here, I will try to explain.
On the wire, using CBOR, the transaction inputs are just in a list:
- https://github.com/input-output-hk/cardano-ledger/blob/ac405a977557a7c58ce1cf69d3c2a0bf148cf19f/eras/babbage/test-suite/cddl-files/babbage.cddl#L54
- https://github.com/input-output-hk/cardano-ledger/blob/ac405a977557a7c58ce1cf69d3c2a0bf148cf19f/eras/babbage/test-suite/cddl-files/babbage_extra.cddl#L4
(Arguably, we should have used the CBOR set tag, #258
, but we didn't.)
This list does not have to be sorted, but some services (like the CLI) might sort them anyway.
Now, inside the transaction body, we store the transaction inputs as a Set
. We think of them semantically as sets, and we also use Haskell's Data.Set
. I (we?) don't think of sets as inherently ordered, so in order to provide an index for the transaction input, we needed something, and lex ordering seemed a fine choice (though with hindsight 20/20 we should have at least more carefully considered the serialization ordering. or more like hindsight 2018 :laughing: ). The code for placing new outputs in the UTxO has been basically unchanged since Shelley. Moreover, this is the same order that they are now passed to the script context. And, very importantly, this is the order that the redeemers need to use.
If you try to batch transactions, and do so in a way that preserves the on-chain order, this (quite surprising) sorting destroys that.
Do you have a minimal example of this? I feel like I might have read a post about this a while back... :thinking: