plutus Undocumented invariants in types with exposed constructors

Summary

Multiple types have undocumented invariants, and also expose their constructors, in modules that are not marked as unsafe or internal. The following is a (non-exhaustive) list of examples:

PubKey (in all variants) must be 64 bytes long.
POSIXTime must be non-negative.
Signature must be 64 bytes long.
DatumHash must be 28 bytes long.
TokenName must be no longer than 32 bytes, and be hex encoded.
TxId must be 28 bytes long.
ValidatorHash must be 28 bytes long.
PubKeyHash (in all variants) must 28 bytes long.

Steps to reproduce the behavior

Attempt to construct any value of any of the listed types which does not follow the specified invariant(s).
Observe how the compiler accepts it.
Try to use it on-chain.
Watch things blow up.

Actual Result

Your transactions will fail, usually by erroring out.

Expected Result

If we can construct a value, it should be usable without erroring. If it has internal invariants, those should be documented at least; better still, the constructors should not be exposed.

Describe the approach you would take to fix this

At minimum, all such invariants must be documented clearly at the definition site of the type, ideally with reasons why they are so. If this is for stuff that is external to the type, links should be provided in the Haddocks to the relevant information. However, in a more ideal world, the constructors of such types should also either inhabit modules clearly marked as unsafe, or not be exposed at all in favour of safe construction methods, such as smart constructors or quasi-quoters.

System info

Plutus commit: 3f3f875764f2e329c8d738f99cdef0c1260ccde1

Mar 17 '22 23:03 kozross

Do the UnsafeFromData instances verify these invariants?

Mar 31 '22 09:03 L-as

And it seems they don't! This seems potentially problematic?

Let's say we have a state machine that is supposed to allow minting a token with a token name that an untrusted user specifies in a previous step. The token name would be put into the state (i.e. the datum) before being used by the code to validate the final transaction that mints the token and releases the funds in the state machine to some specific address.

The developer might (rightfully?) assume that the UnsafeFromData instance verifies the TokenName such that it is in fact possible to have a token with that name. However, currently, a user could supply a 33 byte token name, which would mean that the final transaction is impossible to construct, and thus the funds are locked forever.

This seems like a vulnerability in plutus-ledger-api?

Mar 31 '22 09:03 L-as

These types are not intended to rule out invalid states, but merely to be lightweight newtypes that prevent confusion between values of different types. So more like newtype UserName = UserName Text than newtype WellFormedPublicKey = WellFormedPublicKey ByteString.

I agree that this should be made clear in the documentation, since you both thought otherwise!

Mar 31 '22 10:03 michaelpj

@michaelpj This seems like a really strange decision to me. Even so, these invariants need to be documented very publically.

Mar 31 '22 17:03 kozross

There's nothing wrong with lightweight newtypes, and they're perfectly useful at that job. The problem is indeed if people assume that they're more than that.

Apr 05 '22 10:04 michaelpj

The severity of this issue will be apparent when the practical consequences happen.

Apr 05 '22 10:04 L-as

People tend to assume a lot of things unfortunately, and also, people tend to not read the documentation. Could we expose some smart constructors that validate these values? Is it possible rename these newtype constructors to something containing Unsafe without a new Plutus version?

Apr 06 '22 14:04 szg251

Why is DatumHash/TxId/PubKeyHash 28 bytes? For example DatumHash is described as: -- | Script runtime representation of a @Digest SHA256@. Why are these not 32 bytes? also what are the places where such invariants could be found already documented?

Apr 10 '22 16:04 zmrocze

@zmrocze This is based on MLabs discoveries (usually through stuff blowing up). As far as I am aware, they are documented nowhere.

Apr 10 '22 19:04 kozross

`DatumHash`, `RedeemerHash`, `ValidatorHash`, `MintingPolicyHash`, `StakeValidatorHash`

For types Datum, Redeemer, Validator, StakeValidator, MintingPolicy from plutus-ledger-api/Plutus.V1.Ledger.Scripts there are functions in plutus-ledger/Ledger.Scripts calculating matching hashes.

DatumHash, RedeemerHash boil down to hashScriptData which is Blake2b_256 hash. There is little encoding needed to call this function.
ValidatorHash, StakeValidatorHash and MintingPolicyHash boil down to hashScript which is Blake2b_224 hash, therefore 28 bytes. To calculate hash a PLC script is encoded to bytestring with flat [see note].

So there is a mismatch with plutus-ledger-api docs saying these are sha256 hashes. Also if we were to hide these hash constructors then we should move hashing functions to plutus-ledger-api.

`TxId`

TxId can be at least two different things:

double-sha256 hash of a plutus transaction stripped from input witnesses. As documented.
but also a blake2b_256 hash of a cardano transaction body, calculated using getCardanoTxId or getCardanoApiTxId, used for example in Tutorial.Escrow.

Apr 20 '22 18:04 zmrocze

@michaelpj I just want to get some idea about where to go next, I think adding some docs would be a good first step, but I wonder if we could create smart constructors. I think, exposing some smart constructors that parse the input, and document them in the newtype constructors could save dApp developers from some shootguns. Ideally, we would rename the newtype constructors to Unsafe... (e.g. PubKeyHash -> UnsafePubKeyHash), but as I mentioned earlier, I'm afraid that changing these would require a Plutus language update, am I correct?

Apr 26 '22 10:04 szg251

@gege251 That wouldn't affect the underlying language, so no, it wouldn't need a new versionp

Apr 26 '22 13:04 L-as

TokenName must be no longer than 32 bytes, and be hex encoded.

I got a bit confused by this. What do you mean by "be hex encoded"? The IsString impl for TokenName (and code from the official tutorials) suggests we can use regular english words as TokenName. That is, "ABC" :: TokenName is valid. Is this not the case?

May 03 '22 10:05 TotallyNotChase

It seems that also CurrencySymbol should be treated similarly to other hashes.

Jul 01 '22 13:07 maciej-bendkowski

... and perhaps an empty byte (for Ada)?

Jul 01 '22 13:07 maciej-bendkowski

As of now the hash types used are defined in cardano-ledger/StandardCrypto, where data hashes correspond to HASH (therefore are be 32 byte) and script hashes correspond to ADDRHASH (therefore 28 byte).

So:

DatumHash, RedeemerHash - 32 byte
PubKeyHash, ScriptHash, ValidatorHash, MintingPolicyHash, StakeValidatorHash - 28 byte
CurrencySymbol - empty or 28 byte
TokenName - 0 to 32 bytes
TxId - problematic: will be 32 byte, but in the wild will be a hash of various tx representations (see)

All of the above excluding TokenName use hex encoding for IsString, Show and Pretty, but can be any bytestring. TokenName uses utf-8 for IsString and tries to use it for Show (code)

Also I added this to code docs in #4597, but it sits so far.

Jul 11 '22 12:07 zmrocze

Related discussion.

Can we please prioritize this? We're having to write these checkers ourselves because these types are insufficient for their purpose.

If there's a genuine engineering reason why these types cannot change, can we at least get something to validate that our values are not total nonsense? This is having real impact on multiple projects -- not just developer time and an unreasonably high barrier to entry, but wasting money on audits when official tooling can't get this correct.

Aug 09 '22 01:08 peter-mlabs

The cryptographic details used in the ledger are explained in appendix A of the Shelley ledger specification. In particular, A.1 Hashing, explains which hashes are 32 bytes and which are 28. We use 28 bytes for those hashes that end up in addresses, and 32 everywhere else.

It would probably be worth adding an appendix to the other ledger specs (Multi-assets, Alonzo, and Babbage) to re-affirm this. Especially since sometimes it is not obvious, such as CurrencySymbol. The CurrencySymbol is a script hash (I'm surprised by this name, in fact, I thought we consistently called it the policy ID), and script hashes show up in addresses and are hence 28 bytes.

Aug 09 '22 12:08 JaredCorduan

Thank you for bringing this to our attention.

Regarding inaccurate Haddock: I’ll make a PR to update the Haddock to include more accurate and useful information.

Regarding whether or not to expose the constructors: We are in the process of tidying up our plutus-ledger-api package. It will eventually only contain the types that are actually used by the ScriptContext. Everything else will be deprecated or moved to plutus-script-utils in plutus-apps. Developers can make use of plutus-script-utils, which is under active development and defines utility functions for working with Plutus scripts. We thank you in advance for your patience while we finish this process, which will be after Vasil.

Regarding exposing “unsafe” constructors and adding some “safe” constructors: At the moment we would like to keep these types simple. This is deliberate and is consistent with our approach elsewhere. We do not take the responsibility of ensuring invariants. This is the responsibility of the developer. We will make it clear in the Haddock. We do value your opinions so I’ll make a separate issue about using smart constructors for some types to further discuss it.

Aug 12 '22 14:08 thealmarty

Thanks for your response, Marty.

We do not take the responsibility of ensuring invariants. This is the responsibility of the developer.

I'd like to know whether this statement marks the official IOG designation of WONTFIX -- in speaking to Jared, Andrew, MPJ, and now yourself, there seems to some internal division on whether this should or shouldn't be considered within the domain of plutus-ledger-api. I'm assuming this to be the case, since #4597 was closed without merging.

Aug 12 '22 16:08 peter-mlabs

Sorry for the confusion Peter. We are working on a fix for the documentation and we will cherrypick the first 2 commits of #4597 into it. The third commit includes checking invariants which we haven't decided we want to make the switch yet. We will be collecting thoughts about that in an upcoming issue(I'll post a link here when it's ready). Please feel free to join the discussion there.

Aug 12 '22 17:08 thealmarty

Just my two cents. If the decision is made to hide these constructors, they should still be importable from an "Internal" module to allow for the creation of type class instances.

Aug 12 '22 20:08 jfischoff

If the newtypes don't mean anything, why not just make them type synonyms? That way there will be no confusion wrt. what they mean.

Aug 12 '22 22:08 L-as

Here is the issue.

Aug 16 '22 15:08 thealmarty

plutus plutus copied to clipboard

Undocumented invariants in types with exposed constructors

Summary

Steps to reproduce the behavior

Actual Result

Expected Result

Describe the approach you would take to fix this

System info

DatumHash, RedeemerHash, ValidatorHash, MintingPolicyHash, StakeValidatorHash

TxId

plutus
plutus copied to clipboard

`DatumHash`, `RedeemerHash`, `ValidatorHash`, `MintingPolicyHash`, `StakeValidatorHash`

`TxId`