cardano-ledger icon indicating copy to clipboard operation
cardano-ledger copied to clipboard

Fix Byron address format

Open SebastienGllmt opened this issue 3 years ago • 6 comments

The CDDL definition for Byron era has some mistakes in it:

  1. It marked addrdistr as option. However, you can see that in the written spec, the field is not optional image

I'm a little confused by the Rust implementation which marks the field as required but at the same time allows it to be missing during deserialization (setting BootstrapEraDistr as the default value) so maybe there is some history I am missing here (if this is intended, then maybe the .default type should be used in the CDDL definition)

EDIT: it turns out Dd addresses always contain the map, but Ae2 addresses just contain an empty map instead (which thus need a default stake_distribution)

  1. Both the written spec and the binary spec are missing the protocolMagic attribute in addresses. Potentially the cause of this is because the protocolMagic was only added partway through the Byron era (notably, added when the public testnet was released) and it was only used on testnets.

You could argue that maybe this field is included in the Unparsed :: Word8 ⇀ Bytes } mapping used for attributes in the written spec, but this unparsed mapping is also missing from the binary spec so it's better to have protocolMagic than nothing I guess

  1. The checksum for address is marked as u64, but it should be u32 since crc32 is a 32-bit type

SebastienGllmt avatar Jul 20 '22 11:07 SebastienGllmt

I believe the actual definitions for addrattr and protocolMagic actually should be

addrattr = {   0 : bytes .cbor addrdistr
           , ? 1 : bytes                   ; HDAddressPayload (but may contain anything)
           , ? 2 : bytes .cbor protocolMagic
           }

I haven't been able to validate this because currently our codegen tool doesn't support the .cbor CDDL keyword so I can't see if it generates the same (de)serialization code

SebastienGllmt avatar Jul 20 '22 13:07 SebastienGllmt

This is the CDDL definition that worked for me

blake2b-224 = bytes .size 28

addressId = blake2b-224 ; blake2b-224 of sha256 of cbor(addrtype, spendingData, addrAttributes)
stakeholderId = blake2b-224

protocolMagic = u32
HDAddressPayload = bytes

; Addresses

singleKeyDistr = (0, stakeholderid)
bootstrapEraDistr = (1, uint)

stakeDistribution =
  [  bootstrapEraDistr
  // singleKeyDistr
  ]

publicEd25519Bip32 = bytes .size 64
script = bytes .size 32
publicEd25519 = bytes .size 32

spendingDataPubKeyASD = (0, publicEd25519Bip32)
spendingDataScriptASD = (1, script)
spendingDataRedeemASD = (2, publicEd25519)

spendingData = [
     spendingDataPubKeyASD
  // spendingDataScriptASD
  // spendingDataRedeemASD
]


addrType = 0 ; Public Key
         / 1 ; Script
         / 2 ; Redeem

addrAttributes = {
             ? 0 : bytes .cbor stakeDistribution         ; no value -> BootstrapEraDistr
           , ? 1 : bytes .cbor HDAddressPayload      ; strictly speaking, this may contain anything
           , ? 2 : bytes .cbor protocolMagic
           }

addressContent = [addressid, addrAttributes, addrtype]

crc32 = u32
address = [ #6.24(bytes .cbor addressContent), crc32 ]

SebastienGllmt avatar Jul 25 '22 17:07 SebastienGllmt

I'm not familiar with the Byron serialization code at all, but I had a look to see if I could figure out how it worked.


The FromCBOR instance for the Address Attributes is here: https://github.com/input-output-hk/cardano-ledger/blob/14e1bcc89e275600efb8b66c7cefeebfb1764204/eras/byron/ledger/impl/src/Cardano/Chain/Common/AddrAttributes.hs#L108-L110

You can see that the bulk of the work is done by fromCBORAttributes, which takes an initial value and an iterating function (so like a fold). The are only two known fields in the Address Attributes: https://github.com/input-output-hk/cardano-ledger/blob/14e1bcc89e275600efb8b66c7cefeebfb1764204/eras/byron/ledger/impl/src/Cardano/Chain/Common/AddrAttributes.hs#L41-L44

The initial values sets the derivation path to Nothing and the network magic to NetworkMainOrStage. The iterating function (named go) only knows how to parse fields 1 and 2, corresponding respectively to derivation path and the network magic. Note the absence of something corresponding to key 0, addrdistr.

Any fields that the iterating function (called go in the the FromCBOR instance, and updater inside fromCBORAttributes) does not know how to handle are ignored by fromCBORAttributes, but persist inside attrRemain: https://github.com/input-output-hk/cardano-ledger/blob/14e1bcc89e275600efb8b66c7cefeebfb1764204/eras/byron/ledger/impl/src/Cardano/Chain/Common/Attributes.hs#L214-L225

Perhaps the correct CDDL description is:

addrattr = { ? 1 : bytes
           , ? 2 : protocolMagic }

Note that the ToCBOR instance also does not serialize the 0 key:

https://github.com/input-output-hk/cardano-ledger/blob/14e1bcc89e275600efb8b66c7cefeebfb1764204/eras/byron/ledger/impl/src/Cardano/Chain/Common/AddrAttributes.hs#L83


Of course, we shouldn't guess, but actually test this. This was just my first pass to try to understand...

JaredCorduan avatar Jul 25 '22 17:07 JaredCorduan

For your above point, I'll try and look into this

Related, but I also noticed that the definition of byron witnesses in post-Shelley (https://github.com/input-output-hk/cardano-ledger/blob/master/eras/babbage/test-suite/cddl-files/babbage.cddl#L356) should be bytes .cbor addrAttributes

SebastienGllmt avatar Aug 03 '22 05:08 SebastienGllmt

I noticed that in the definition address = [ #6.24(bytes .cbor ([addressid, addrattr, addrtype])), u64 ], the u64 at the end is actually a crc32. What's strange is that the crc32 should be a u32 (as its name implies). Does the node actually allow up to u64 values here?

SebastienGllmt avatar Aug 06 '22 05:08 SebastienGllmt

I noticed that in the definition address = [ #6.24(bytes .cbor ([addressid, addrattr, addrtype])), u64 ], the u64 at the end is actually a crc32. What's strange is that the crc32 should be a u32 (as its name implies). Does the node actually allow up to u64 values here?

Great observation, thank you. It looks like the node does not allow up to u64 as the wire spec says, but in fact u32 as one would expect:

https://github.com/input-output-hk/cardano-ledger/blob/7f7a8ae757a5f60ed759efed8b0bc721277b95f7/eras/byron/ledger/impl/src/Cardano/Chain/Common/CBOR.hs#L117-L132

Note that actualCrc is a Word32, that expectedCrc is the value read from the wire, and that both values have the same type since they are compared without conversion.

It would still be great to test this to be certain.

JaredCorduan avatar Aug 07 '22 21:08 JaredCorduan

I've made #3083 to track the issues raised in this PR (and will close this PR until we have answer to the outstanding questions).

JaredCorduan avatar Oct 13 '22 14:10 JaredCorduan