plutus
plutus copied to clipboard
Undocumented invariants in types with exposed constructors
Summary
Multiple types have undocumented invariants, and also expose their constructors, in modules that are not marked as unsafe or internal. The following is a (non-exhaustive) list of examples:
-
PubKey
(in all variants) must be 64 bytes long. -
POSIXTime
must be non-negative. -
Signature
must be 64 bytes long. -
DatumHash
must be 28 bytes long. -
TokenName
must be no longer than 32 bytes, and be hex encoded. -
TxId
must be 28 bytes long. -
ValidatorHash
must be 28 bytes long. -
PubKeyHash
(in all variants) must 28 bytes long.
Steps to reproduce the behavior
- Attempt to construct any value of any of the listed types which does not follow the specified invariant(s).
- Observe how the compiler accepts it.
- Try to use it on-chain.
- Watch things blow up.
Actual Result
Your transactions will fail, usually by erroring out.
Expected Result
If we can construct a value, it should be usable without erroring. If it has internal invariants, those should be documented at least; better still, the constructors should not be exposed.
Describe the approach you would take to fix this
At minimum, all such invariants must be documented clearly at the definition site of the type, ideally with reasons why they are so. If this is for stuff that is external to the type, links should be provided in the Haddocks to the relevant information. However, in a more ideal world, the constructors of such types should also either inhabit modules clearly marked as unsafe, or not be exposed at all in favour of safe construction methods, such as smart constructors or quasi-quoters.
System info
Plutus commit: 3f3f875764f2e329c8d738f99cdef0c1260ccde1
Do the UnsafeFromData instances verify these invariants?
And it seems they don't! This seems potentially problematic?
Let's say we have a state machine that is supposed to allow minting a token with a token name that an untrusted user specifies in a previous step. The token name would be put into the state (i.e. the datum) before being used by the code to validate the final transaction that mints the token and releases the funds in the state machine to some specific address.
The developer might (rightfully?) assume that the UnsafeFromData
instance verifies the TokenName
such that it is in fact possible to have a token with that name. However, currently, a user could supply a 33 byte token name, which would mean that the final transaction is impossible to construct, and thus the funds are locked forever.
This seems like a vulnerability in plutus-ledger-api?
These types are not intended to rule out invalid states, but merely to be lightweight newtypes that prevent confusion between values of different types. So more like newtype UserName = UserName Text
than newtype WellFormedPublicKey = WellFormedPublicKey ByteString
.
I agree that this should be made clear in the documentation, since you both thought otherwise!
@michaelpj This seems like a really strange decision to me. Even so, these invariants need to be documented very publically.
There's nothing wrong with lightweight newtypes, and they're perfectly useful at that job. The problem is indeed if people assume that they're more than that.
The severity of this issue will be apparent when the practical consequences happen.
People tend to assume a lot of things unfortunately, and also, people tend to not read the documentation.
Could we expose some smart constructors that validate these values?
Is it possible rename these newtype constructors to something containing Unsafe
without a new Plutus version?
Why is DatumHash
/TxId
/PubKeyHash
28 bytes? For example DatumHash
is described as:
-- | Script runtime representation of a @Digest SHA256@.
Why are these not 32 bytes? also what are the places where such invariants could be found already documented?
@zmrocze This is based on MLabs discoveries (usually through stuff blowing up). As far as I am aware, they are documented nowhere.
DatumHash
, RedeemerHash
, ValidatorHash
, MintingPolicyHash
, StakeValidatorHash
For types Datum
, Redeemer
, Validator
, StakeValidator
, MintingPolicy
from plutus-ledger-api/Plutus.V1.Ledger.Scripts there are functions in plutus-ledger/Ledger.Scripts calculating matching hashes.
-
DatumHash
,RedeemerHash
boil down to hashScriptData which is Blake2b_256 hash. There is little encoding needed to call this function. -
ValidatorHash
,StakeValidatorHash
andMintingPolicyHash
boil down to hashScript which is Blake2b_224 hash, therefore 28 bytes. To calculate hash a PLC script is encoded to bytestring with flat [see note].
So there is a mismatch with plutus-ledger-api docs saying these are sha256 hashes. Also if we were to hide these hash constructors then we should move hashing functions to plutus-ledger-api.
TxId
TxId
can be at least two different things:
- double-sha256 hash of a plutus transaction stripped from input witnesses. As documented.
- but also a blake2b_256 hash of a cardano transaction body, calculated using getCardanoTxId or getCardanoApiTxId, used for example in Tutorial.Escrow.
@michaelpj I just want to get some idea about where to go next, I think adding some docs would be a good first step, but I wonder if we could create smart constructors. I think, exposing some smart constructors that parse the input, and document them in the newtype constructors could save dApp developers from some shootguns.
Ideally, we would rename the newtype constructors to Unsafe... (e.g. PubKeyHash
-> UnsafePubKeyHash
), but as I mentioned earlier, I'm afraid that changing these would require a Plutus language update, am I correct?
@gege251 That wouldn't affect the underlying language, so no, it wouldn't need a new versionp
TokenName
must be no longer than 32 bytes, and be hex encoded.
I got a bit confused by this. What do you mean by "be hex encoded"? The IsString
impl for TokenName
(and code from the official tutorials) suggests we can use regular english words as TokenName
. That is, "ABC" :: TokenName
is valid. Is this not the case?
It seems that also CurrencySymbol
should be treated similarly to other hashes.
... and perhaps an empty byte (for Ada)?
As of now the hash types used are defined in cardano-ledger/StandardCrypto, where data hashes correspond to HASH
(therefore are be 32 byte) and script hashes correspond to ADDRHASH
(therefore 28 byte).
So:
-
DatumHash
,RedeemerHash
- 32 byte -
PubKeyHash
,ScriptHash
,ValidatorHash
,MintingPolicyHash
,StakeValidatorHash
- 28 byte -
CurrencySymbol
- empty or 28 byte -
TokenName
- 0 to 32 bytes -
TxId
- problematic: will be 32 byte, but in the wild will be a hash of various tx representations (see)
All of the above excluding TokenName
use hex encoding for IsString
, Show
and Pretty
, but can be any bytestring.
TokenName
uses utf-8 for IsString
and tries to use it for Show
(code)
Also I added this to code docs in #4597, but it sits so far.
Can we please prioritize this? We're having to write these checkers ourselves because these types are insufficient for their purpose.
If there's a genuine engineering reason why these types cannot change, can we at least get something to validate that our values are not total nonsense? This is having real impact on multiple projects -- not just developer time and an unreasonably high barrier to entry, but wasting money on audits when official tooling can't get this correct.
The cryptographic details used in the ledger are explained in appendix A of the Shelley ledger specification. In particular, A.1 Hashing
, explains which hashes are 32 bytes and which are 28. We use 28 bytes for those hashes that end up in addresses, and 32 everywhere else.
It would probably be worth adding an appendix to the other ledger specs (Multi-assets, Alonzo, and Babbage) to re-affirm this. Especially since sometimes it is not obvious, such as CurrencySymbol
. The CurrencySymbol
is a script hash (I'm surprised by this name, in fact, I thought we consistently called it the policy ID), and script hashes show up in addresses and are hence 28 bytes.
Thank you for bringing this to our attention.
Regarding inaccurate Haddock: I’ll make a PR to update the Haddock to include more accurate and useful information.
Regarding whether or not to expose the constructors: We are in the process of tidying up our plutus-ledger-api
package. It will eventually only contain the types that are actually used by the ScriptContext
. Everything else will be deprecated or moved to plutus-script-utils
in plutus-apps
. Developers can make use of plutus-script-utils
, which is under active development and defines utility functions for working with Plutus scripts. We thank you in advance for your patience while we finish this process, which will be after Vasil.
Regarding exposing “unsafe” constructors and adding some “safe” constructors: At the moment we would like to keep these types simple. This is deliberate and is consistent with our approach elsewhere. We do not take the responsibility of ensuring invariants. This is the responsibility of the developer. We will make it clear in the Haddock. We do value your opinions so I’ll make a separate issue about using smart constructors for some types to further discuss it.
Thanks for your response, Marty.
We do not take the responsibility of ensuring invariants. This is the responsibility of the developer.
I'd like to know whether this statement marks the official IOG designation of WONTFIX -- in speaking to Jared, Andrew, MPJ, and now yourself, there seems to some internal division on whether this should or shouldn't be considered within the domain of plutus-ledger-api
. I'm assuming this to be the case, since #4597 was closed without merging.
Sorry for the confusion Peter. We are working on a fix for the documentation and we will cherrypick the first 2 commits of #4597 into it. The third commit includes checking invariants which we haven't decided we want to make the switch yet. We will be collecting thoughts about that in an upcoming issue(I'll post a link here when it's ready). Please feel free to join the discussion there.
Just my two cents. If the decision is made to hide these constructors, they should still be importable from an "Internal" module to allow for the creation of type class instances.
If the newtypes don't mean anything, why not just make them type synonyms? That way there will be no confusion wrt. what they mean.
Here is the issue.