plutus icon indicating copy to clipboard operation
plutus copied to clipboard

Why inputs for TX are treated as Set?

Open anton-k opened this issue 3 years ago • 9 comments

Summary

If inputs are set it is impossible to rely on order of inputs. this forces us to do redundant filter-like searches instead of just pattern matching on list of inputs.

In the case of outputs it's convemient that outputs are list and we can do

fooOut : barOut : bazOut : _ = txInfoOutputs info

Instead of more resource heavy filtering:

fooOut = head $ filter  fooPred (txInfoOutputs info)
barOut = head $ filter  barPred (txInfoOutputs info)
bazOut = head $ filter  bazPred (txInfoOutputs info)

but because inputs form not a list but set it can not be done for inputs. This leads to redundant filtering.

Steps to reproduce the behavior


Actual Result

I'd like to be able to pattern match on list of inputs inside validator and rely on order. It's true that spend inputs are unpredictable but we can move all arbitrary user inputs to the end of the list and have easy to use access to inputs that spend scripts and which are useful for checks inside validator.

Expected Result


Describe the approach you would take to fix this

No response

System info


anton-k avatar Dec 17 '21 11:12 anton-k

I don't know what you're referring to. Inputs are a list in TxInfo. If you're referring to the fact that inputs in transactions themselves are a set, then you need to open an issue with the ledger, which I think is highly unlikely to change.

michaelpj avatar Dec 17 '21 12:12 michaelpj

Yeah, they are list in the TxInfo but they go over internal set...

anton-k avatar Dec 17 '21 12:12 anton-k

I think what you are saying is that you cannot predict the ordering of the list, because it's based on the set in the real transaction.

michaelpj avatar Dec 17 '21 12:12 michaelpj

That's right, this leads to some redundant checks or filter searches. That we could avoid if we were sure that order will be the same in validator as in the TX.

anton-k avatar Dec 17 '21 12:12 anton-k

I think what you are saying is that you cannot predict the ordering of the list, because it's based on the set in the real transaction.

So the inputs aren't sorted or anything like that? The order just matches the order in the transaction, which is determined by the submitter I assume?

L-as avatar Dec 29 '21 19:12 L-as

No, it does not match the order in the original serialization, but rather the internal sort order used by the ledger.

michaelpj avatar Jan 04 '22 11:01 michaelpj

Do you have any tips for where to look for this sort order? is it defined in https://github.com/input-output-hk/cardano-ledger or do I have to look through the node implementation to figure it out?

L-as avatar Jan 04 '22 22:01 L-as

@L-as the ordering is the lexicographic ordering on pairs (TransactionID, TransactionIndex). @michaelpj do you know if there is a place in our documentation where it would be helpful to include this detail?

JaredCorduan avatar Feb 08 '22 16:02 JaredCorduan

I guess it would be in the spec?

michaelpj avatar Feb 08 '22 17:02 michaelpj

No, it does not match the order in the original serialization, but rather the internal sort order used by the ledger.

Sounds like this is the expected behavior of Plutus then.

@michaelpj however given the confusion, maybe it's worth clarifying the alignment in Plutus docs too? Or was it already done at some point?

effectfully avatar Feb 02 '23 22:02 effectfully

I think this should be in the part of the ledger spec that explains how the script context is converted into Data, which doesn't currently exist.

michaelpj avatar Feb 03 '23 16:02 michaelpj

I think this should be in the part of the ledger spec that explains how the script context is converted into Data, which doesn't currently exist.

OK, I've opened a ticket on the ledger side and I'm closing this one.

effectfully avatar Feb 03 '23 16:02 effectfully

If we specify (as is currently true) that the transaction inputs in the script context are in lex order, then we are essentially promising that plutus scripts can depend on this. ie that we will never change it. with the advantage of make scripts more efficient. And a bug in the ledger code (breaking this promises, accidentally) would lead to problems in Plutus scripts.

If we make it clear that we make no promises, then we give ourselves flexibility. at the cost of this filtering.

In general, it's unclear to me where to draw the line on what ledger internals we should let Plutus scripts depend on.

JaredCorduan avatar Feb 03 '23 17:02 JaredCorduan

Well, we should specify what happens, because it's part of the observable behaviour of the chain. You could not write a competing node without reproducing this behaviour, therefore it should be specified. I don't think we have the liberty to declare that it's undefined behaviour.

michaelpj avatar Feb 03 '23 17:02 michaelpj

In this instance, I think it is fair, but in general, are we promising that everything that the ledger does for the script context is a promise forever? Ie the code is the true spec? I guess if, say, another implementation sent the inputs in a different order, we can never control what assumptions scripts might make, and such a divergence would lead to problems (even if the spec itself made it clear that it was undefined...).

JaredCorduan avatar Feb 03 '23 17:02 JaredCorduan

FWIW scripts rely on this behaviour to optimise operations on maps.

L-as avatar Feb 04 '23 12:02 L-as

In this instance, I think it is fair, but in general, are we promising that everything that the ledger does for the script context is a promise forever? Ie the code is the true spec?

Yes, we are. The script context object is observable by scripts, and so any changes to it can affect at least e.g. the budget and therefore whether or not scripts validate. This is why it's such a problem that we haven't specified it properly.

michaelpj avatar Feb 06 '23 10:02 michaelpj

Anytime we've discussed a change and noticed that it would be observable by scripts, we've gone out of our way to do something different which was not observable, even when is seemed extremely unlikely to ever manifest, so I'm absolutely sympathetic to the concern.

But promising that we will never make a change observable by scripts is very daunting. It is daunting for two reasons:

  • The ledger is comprised of many lines of code, what human could say with the utmost confidence that they could list out every observable fact about the script context? How would anyone ever know if they've exhausted the list? I do think I've got most (nearly all?) of it in my head, Peter and I came up with a pretty good list, but there will always be a lingering doubt that we've missed something.
  • We are in the Rumsfeld "unknown unknowns" territory, making promises about a future that is hard to predict while ruling out future possibilities. (I do realize that we can have new assumptions in new versions of Plutus, but it can still be extremely difficult to maintain invariants for the existing languages that have no expiration).

The alternative that I see is this: write down an explicit list of assumptions that script authors can assume about the script context, write property tests for this list, and declare everything not on the list as undefined behavior that should not be relied upon.

JaredCorduan avatar Feb 06 '23 19:02 JaredCorduan

The alternative that I see is this: write down an explicit list of assumptions that script authors can assume about the script context, write property tests for this list, and declare everything not on the list as undefined behavior that should not be relied upon.

I think the alternative is to fully specify the creation of the script context. Note that doesn't mean specifying everything about the ledger, just how the script context is created. Since it mostly re-presents information from the transaction, I don't expect that to be so difficult, apart from the odd thing like ordering.

I also just cannot see how we can ever say anything is "undefined behaviour". If behaviour is observable by scripts, and there are scripts in the history of the chain that depend on that to validate, then if we change that then we break history. It's just like everything else.

michaelpj avatar Feb 07 '23 11:02 michaelpj

I think the alternative is to fully specify the creation of the script context. Note that doesn't mean specifying everything about the ledger, just how the script context is created. Since it mostly re-presents information from the transaction, I don't expect that to be so difficult, apart from the odd thing like ordering.

Maybe it's not so bad, and my time is better spent attempting the task than chatting :laughing:

I also just cannot see how we can ever say anything is "undefined behaviour". If behaviour is observable by scripts, and there are scripts in the history of the chain that depend on that to validate, then if we change that then we break history. It's just like everything else.

My concern is not about breaking history, indeed we cannot do that. I am worried about what we can and cannot change with a hard fork.

JaredCorduan avatar Feb 07 '23 12:02 JaredCorduan

Maybe it's not so bad, and my time is better spent attempting the task than chatting

FWIW Matthias is already annoyed that it's not specified, and I basically promised that I'd specify it for the new SOP version: https://github.com/cardano-foundation/CIPs/pull/455#issuecomment-1411727761

Let's talk about how to do the work best, but I think the Plutus team can chip in here.

I am worried about what we can and cannot change with a hard fork.

Assuming we keep the old behaviour for old scripts then we can change whatever we like! This is covered by the ledger-script interface part of CIP-35, and I think it's fine to do in a new Plutus ledger language. We could do it on a protocol version change, but given that we need to retain the old behaviour anyway, I think we should generally do the usual thing and do it with a new Plutus ledger language.

michaelpj avatar Feb 07 '23 12:02 michaelpj

Assuming we keep the old behaviour for old scripts then we can change whatever we like!

you make it sound so easy :lolsob:

JaredCorduan avatar Feb 07 '23 13:02 JaredCorduan

We are brothers in backwards-compatibility-suffering :joy:

michaelpj avatar Feb 07 '23 13:02 michaelpj

Assuming we keep the old behaviour for old scripts then we can change whatever we like!

Imagine we have features a, b, and c in ledger that make their way into the script context. there are all kinds of properties P(a, b, c) that might hold for Plutus VX scripts. Imagine now adding feature d to the ledger. even if d is not used anywhere in some Plutus VX script, P(a, b, c) could fail in the presence of d in the ledger if we do not go out of our way to preserve it.

If we promise to maintain all the current observable facts, then we are promising: "for any idea d that we every come up with in the future, and any property P(a, b, c) of the current ledger, P(a, b, c) will still hold with the introduction of d. (I should probably use some notation like a(d) or something that makes a now depend on d or something...).

Of course I'm happy to make promises about all the Ps that I can think of (assuming they are not too ridiculous), but promises about the ones I've yet to think of are worrisome.

On the one hand, I can hear:

yeesh, Jared, you are making this way more complicated than it needs to be, it's just a simple translation.

But on the other hand, it sounds like exactly the kind of thing that usually bites us.

And I think what you are proposing is something like a "reference implementation" of the script context creation, so that all the Ps become implicit (reducing the infinite to the finite). But even then I worry that this could be too much of a simplification of the real story, and that tendrils of the provenance will find their way all the way back to the wire...

JaredCorduan avatar Feb 07 '23 13:02 JaredCorduan

I think we should get together and figure out what we can and can't promise for the script context. Here are some examples of Ps where weird things happen:

  • We want to remove pointer addresses by turning them into enterprise addresses, but we still have to support them because some script might rely on them. But now we've changed semantics: Let's say a script allows minting some tokens once per epoch proportional to the amount of Ada delegated to some pool. If a pointer is allowed here, this change has now changed the semantics: you'll get the tokens, but the pool owner doesn't get your stake.

  • Let's say we didn't need to keep track of registered reward accounts anymore, and we decide to get rid of the certificates. Well, now some script contexts can't be constructed anymore, so maybe we can't do that and still have to be able to make them, but just as dummies. But now there are some Plutus scripts that might have more steps that they could take, since they can track parts the state of this registry internally. So, if we want to preserve the semantics, we have to keep the current system in place forever.

I think the introduction of new features might be less problematic (we can always disallow old Plutus versions in scripts with new features), but I think that if we wanted to make the strongest possible promise here large parts of the ledger would be completely unchangeable forever.

WhatisRT avatar Feb 07 '23 14:02 WhatisRT

Ah, I think I see. Let me make it concrete: let's talk about the sorting order of the inputs. Suppose that they are initially in using ordering O1, and we want to make a change to O2.

I think:

  • We cannot say "haha, the ordering is unspecified, we're just going to change it silently and break literally everything"
  • We can say "we're going to change the ordering at a particular point using our usual mechanism for backwards compatibility"
  • We don't have to say "we'll maintain that particular ordering forever"

Let's say a script allows minting some tokens once per epoch proportional to the amount of Ada delegated to some pool

Scripts can't tell how much Ada is delegated to a pool, but I take your point. I didn't quite follow the other example but I agree it would be good to get together and think it through.

michaelpj avatar Feb 07 '23 14:02 michaelpj

Let me make it concrete: let's talk about the sorting order of the inputs. Suppose that they are initially in using ordering O1, and we want to make a change to O2.

I definitely appreciate you making a concrete example. but just to be clear to everyone following along, I do think that the ordering of the transaction inputs in the script context is a property that we could make about the V1 and V2 context for eternity. Unfortunately I don't have a better example in mind atm (hence the unknown unknowns).

JaredCorduan avatar Feb 07 '23 15:02 JaredCorduan

Was it an intentional decision somewhere to sort transactions in this way? I.e. is there a good reason why this was done, or is it just an accident of the datastructures used?

Last I dug into it, there was an explicit step when building the script context to sort these inputs (though I'm no good at navigating haskell codebases, so I can't find it again), so it didn't feel like it was an artifact of "this is how the ledger stores things internally".

For reference, this sort order has caused some consternation for at least two of the major DEX's on Cardano (SundaeSwap / MinSwap), and is limiting throughput. If you try to batch transactions, and do so in a way that preserves the on-chain order, this (quite surprising) sorting destroys that.

The only solution is to add an extra field in the redeemer and sort them in the script (costly), or to segment the batching into monotonic subsequences, which hurts protocol throughput.

I understand that the ship has sailed for PlutusV1/V2, likely (as, beyond even just execution units, the actual outcome of the transaction depends on this sort order now); but it'd be good to document somewhere not just that this is the behavior, but why: i.e. an accident of history, or xyz good reason.

That way when designing future Plutus versions, we can either revisit that decision, or continue to intentionally benefit from xyz good reason.

Quantumplation avatar Feb 08 '23 06:02 Quantumplation

This is mostly an accident caused by a design decision made for Shelley (order didn't matter, so we put things in a set instead of a list). However, I still maintain that it's actually better for the ecosystem this way. There has been a bunch of discussion here: https://github.com/cardano-foundation/CIPs/pull/231

WhatisRT avatar Feb 08 '23 12:02 WhatisRT

Last I dug into it, there was an explicit step when building the script context to sort these inputs (though I'm no good at navigating haskell codebases, so I can't find it again), so it didn't feel like it was an artifact of "this is how the ledger stores things internally".

It actually is how the ledger stores things internally.

There are a few things going on here, I will try to explain.

On the wire, using CBOR, the transaction inputs are just in a list:

  • https://github.com/input-output-hk/cardano-ledger/blob/ac405a977557a7c58ce1cf69d3c2a0bf148cf19f/eras/babbage/test-suite/cddl-files/babbage.cddl#L54
  • https://github.com/input-output-hk/cardano-ledger/blob/ac405a977557a7c58ce1cf69d3c2a0bf148cf19f/eras/babbage/test-suite/cddl-files/babbage_extra.cddl#L4

(Arguably, we should have used the CBOR set tag, #258, but we didn't.)

This list does not have to be sorted, but some services (like the CLI) might sort them anyway.

Now, inside the transaction body, we store the transaction inputs as a Set. We think of them semantically as sets, and we also use Haskell's Data.Set. I (we?) don't think of sets as inherently ordered, so in order to provide an index for the transaction input, we needed something, and lex ordering seemed a fine choice (though with hindsight 20/20 we should have at least more carefully considered the serialization ordering. or more like hindsight 2018 :laughing: ). The code for placing new outputs in the UTxO has been basically unchanged since Shelley. Moreover, this is the same order that they are now passed to the script context. And, very importantly, this is the order that the redeemers need to use.


If you try to batch transactions, and do so in a way that preserves the on-chain order, this (quite surprising) sorting destroys that.

Do you have a minimal example of this? I feel like I might have read a post about this a while back... :thinking:

JaredCorduan avatar Feb 08 '23 14:02 JaredCorduan