cardano-ledger
cardano-ledger copied to clipboard
Arbitrary instances with alternate, valid, serializations.
The ledger is very careful to never re-serialize any data structures that will be hashed. This is important since there are multiple valid ways to encode many data structures in CBOR. Developers who use reuse some of the ledger code, however, are not always as aware of how important this is. In particular, sometimes projects will mix and match the ledger code and other third party serialization libraries, leading to confusion about why hashes do not match. See #2943 as an example.
Section 3.9 of the CBOR RFC describes the various choices allowed in CBOR (in the context of describing which choice to make if you want to be canonical).
The ledger itself, however, has to make specific choices about how to serialize, even though the deserializers are flexible.
In order to make it more transparent that some of the ledger serialization is arbitrary (using definite vs indefinite lists, etc), we can write our Arbitrary
generators such that they vary these choices.
The structure that seems to lead to the most confusion seems to be Datum
. We could address Datum
s directly, or perhaps write a general way of "twiddling" all of our CBOR encodings.
If it's helpful, here are the three big encoding variances we've encountered and had to work around when interacting with browser and hardware wallets:
- definite / indefinite length arrays
- the ordering of map keys
- empty list vs omitted key in maps
The ledger itself, however, has to make specific choices about how to serialize, even though the deserializers are flexible.
Isn't it very important as well that we define the canonical deterministic serialization choices we made, to keep the hashes reproducible by third-party?
@yihuang no, it's important that people not be mislead into thinking that there is a canonical representation. There is not a canonical representation. When you sign or hash the data, you must only hash the original bytes, and not a re-serialisation.
It's a very bad security practice to check signatures or hashes on re-serialised data. It must only be done on the original bytes. Otherwise it leads to all sorts of nasty security problems (think txs with different bytes but that have the same hash).
@yihuang no, it's important that people not be mislead into thinking that there is a canonical representation. There is not a canonical representation. When you sign or hash the data, you must only hash the original bytes, and not a re-serialisation.
It's a very bad security practice to check signatures or hashes on re-serialised data. It must only be done on the original bytes. Otherwise it leads to all sorts of nasty security problems (think txs with different bytes but that have the same hash).
But shouldn't there be a specification or something, or do you mean the only specification is the Haskell code itself, what about the clients implemented in other languages?
@yihuang we have a wire specification (CDDL) for every ledger era. See the table at the top of the readme in this repository. For example, the latest one is here: https://github.com/input-output-hk/cardano-ledger/blob/11e4d4a8ac88adf33baf6b0602635bf37a53803e/eras/babbage/test-suite/cddl-files/babbage.cddl
@yihuang we have a wire specification (CDDL) for every ledger era. See the table at the top of the readme in this repository. For example, the latest one is here: https://github.com/input-output-hk/cardano-ledger/blob/11e4d4a8ac88adf33baf6b0602635bf37a53803e/eras/babbage/test-suite/cddl-files/babbage.cddl
I mean the spec of the serialization details, so third-parties can reproduce the same result? Is there a complete doc on it that I'm not aware of?
no, there is no such a document. there is also no good reason for anyone to try to reproduce the exact arbitrary choices that our code uses which is not captured by the CDDL spec. if they are, they are likely doing exactly what we are trying to prevent. See https://github.com/input-output-hk/cardano-ledger/issues/2943#issuecomment-1203989504
no, there is no such a document. there is also no good reason for anyone to try to reproduce the exact arbitrary choices that our code uses which is not captured by the CDDL spec. if they are, they are likely doing exactly what we are trying to prevent. See https://github.com/input-output-hk/cardano-ledger/issues/2943#issuecomment-1203989504
I can think of at least one case though, implement Cardano in different languages, or do you mean even if the alternative client don't serialize a block in exact way, it still works, because the other nodes won't try to re-serialize it? Hmm, if that's the case, that would make sense though.
I can think of at least one case though, implement Cardano in different languages, or do you mean even if the alternative client don't serialize a block in exact way, it still works, because the other nodes won't try to re-serialize it?
exactly! if everyone conforms to the CDDL spec, and does not re-serialize, then everything works.
Exactly. The spec is the CDDL, and there is no need for other interoperable implementations to serialise in exactly the same way so long as they follow the CDDL specification.
@Quantumplation, what do you mean by
* empty list vs omitted key in maps
@Soupstraw Consider collateral, for example. If no collateral is specified, the cardano-api code will serialize the transaction body as a map, with key 13 set to an empty list:
84 ; Array of 4 elements
a8 ; Map with 8 keys
...
0d 9f ff ; key 13, array; end array
whereas it is also a valid encoding (and the one preferred by hardware wallets, it appears) to just leave the 13
key out of the map entirely
84 ; Array of 4 elements
a7 ; Map with 7 keys
...
The same thing applies to a few other fields, IIRC
right, so what @Quantumplation is referring to is not an ambiguity in CBOR, but rather in our CDDL. We often have optional keys in maps, and there is no semantic difference between leaving that key out, or including that key with a mempty
value.
Ahh, I see, thanks for clarifying!