cardano-ledger
cardano-ledger copied to clipboard
Fix Byron address format
The CDDL definition for Byron era has some mistakes in it:
- It marked
addrdistras option. However, you can see that in the written spec, the field is not optional
I'm a little confused by the Rust implementation which marks the field as required but at the same time allows it to be missing during deserialization (setting BootstrapEraDistr as the default value) so maybe there is some history I am missing here (if this is intended, then maybe the .default type should be used in the CDDL definition)
EDIT: it turns out Dd addresses always contain the map, but Ae2 addresses just contain an empty map instead (which thus need a default stake_distribution)
- Both the written spec and the binary spec are missing the
protocolMagicattribute in addresses. Potentially the cause of this is because the protocolMagic was only added partway through the Byron era (notably, added when the public testnet was released) and it was only used on testnets.
You could argue that maybe this field is included in the Unparsed :: Word8 ⇀ Bytes } mapping used for attributes in the written spec, but this unparsed mapping is also missing from the binary spec so it's better to have protocolMagic than nothing I guess
- The checksum for
addressis marked as u64, but it should be u32 since crc32 is a 32-bit type
I believe the actual definitions for addrattr and protocolMagic actually should be
addrattr = { 0 : bytes .cbor addrdistr
, ? 1 : bytes ; HDAddressPayload (but may contain anything)
, ? 2 : bytes .cbor protocolMagic
}
I haven't been able to validate this because currently our codegen tool doesn't support the .cbor CDDL keyword so I can't see if it generates the same (de)serialization code
This is the CDDL definition that worked for me
blake2b-224 = bytes .size 28
addressId = blake2b-224 ; blake2b-224 of sha256 of cbor(addrtype, spendingData, addrAttributes)
stakeholderId = blake2b-224
protocolMagic = u32
HDAddressPayload = bytes
; Addresses
singleKeyDistr = (0, stakeholderid)
bootstrapEraDistr = (1, uint)
stakeDistribution =
[ bootstrapEraDistr
// singleKeyDistr
]
publicEd25519Bip32 = bytes .size 64
script = bytes .size 32
publicEd25519 = bytes .size 32
spendingDataPubKeyASD = (0, publicEd25519Bip32)
spendingDataScriptASD = (1, script)
spendingDataRedeemASD = (2, publicEd25519)
spendingData = [
spendingDataPubKeyASD
// spendingDataScriptASD
// spendingDataRedeemASD
]
addrType = 0 ; Public Key
/ 1 ; Script
/ 2 ; Redeem
addrAttributes = {
? 0 : bytes .cbor stakeDistribution ; no value -> BootstrapEraDistr
, ? 1 : bytes .cbor HDAddressPayload ; strictly speaking, this may contain anything
, ? 2 : bytes .cbor protocolMagic
}
addressContent = [addressid, addrAttributes, addrtype]
crc32 = u32
address = [ #6.24(bytes .cbor addressContent), crc32 ]
I'm not familiar with the Byron serialization code at all, but I had a look to see if I could figure out how it worked.
The FromCBOR instance for the Address Attributes is here:
https://github.com/input-output-hk/cardano-ledger/blob/14e1bcc89e275600efb8b66c7cefeebfb1764204/eras/byron/ledger/impl/src/Cardano/Chain/Common/AddrAttributes.hs#L108-L110
You can see that the bulk of the work is done by fromCBORAttributes, which takes an initial value and an iterating function (so like a fold). The are only two known fields in the Address Attributes:
https://github.com/input-output-hk/cardano-ledger/blob/14e1bcc89e275600efb8b66c7cefeebfb1764204/eras/byron/ledger/impl/src/Cardano/Chain/Common/AddrAttributes.hs#L41-L44
The initial values sets the derivation path to Nothing and the network magic to NetworkMainOrStage. The iterating function (named go) only knows how to parse fields 1 and 2, corresponding respectively to derivation path and the network magic. Note the absence of something corresponding to key 0, addrdistr.
Any fields that the iterating function (called go in the the FromCBOR instance, and updater inside fromCBORAttributes) does not know how to handle are ignored by fromCBORAttributes, but persist inside attrRemain:
https://github.com/input-output-hk/cardano-ledger/blob/14e1bcc89e275600efb8b66c7cefeebfb1764204/eras/byron/ledger/impl/src/Cardano/Chain/Common/Attributes.hs#L214-L225
Perhaps the correct CDDL description is:
addrattr = { ? 1 : bytes
, ? 2 : protocolMagic }
Note that the ToCBOR instance also does not serialize the 0 key:
https://github.com/input-output-hk/cardano-ledger/blob/14e1bcc89e275600efb8b66c7cefeebfb1764204/eras/byron/ledger/impl/src/Cardano/Chain/Common/AddrAttributes.hs#L83
Of course, we shouldn't guess, but actually test this. This was just my first pass to try to understand...
For your above point, I'll try and look into this
Related, but I also noticed that the definition of byron witnesses in post-Shelley (https://github.com/input-output-hk/cardano-ledger/blob/master/eras/babbage/test-suite/cddl-files/babbage.cddl#L356) should be bytes .cbor addrAttributes
I noticed that in the definition address = [ #6.24(bytes .cbor ([addressid, addrattr, addrtype])), u64 ], the u64 at the end is actually a crc32. What's strange is that the crc32 should be a u32 (as its name implies). Does the node actually allow up to u64 values here?
I noticed that in the definition
address = [ #6.24(bytes .cbor ([addressid, addrattr, addrtype])), u64 ], theu64at the end is actually acrc32. What's strange is that the crc32 should be a u32 (as its name implies). Does the node actually allow up to u64 values here?
Great observation, thank you. It looks like the node does not allow up to u64 as the wire spec says, but in fact u32 as one would expect:
https://github.com/input-output-hk/cardano-ledger/blob/7f7a8ae757a5f60ed759efed8b0bc721277b95f7/eras/byron/ledger/impl/src/Cardano/Chain/Common/CBOR.hs#L117-L132
Note that actualCrc is a Word32, that expectedCrc is the value read from the wire, and that both values have the same type since they are compared without conversion.
It would still be great to test this to be certain.
I've made #3083 to track the issues raised in this PR (and will close this PR until we have answer to the outstanding questions).