bitcoinfuzz icon indicating copy to clipboard operation
bitcoinfuzz copied to clipboard

BIP32: Add Deserialize extended key target

Open kuliq23 opened this issue 2 months ago • 6 comments

NEW TARGET: Deserialization of a BIP32 extended key

This target accepts a Bip32 extended key in the base58check-encoded form (string of b58 chars), deserializes it, then returns it as a string (return may change to bool, or different parts of the key). In order to make it even remotely possible to send a valid extkey, formatting of the random bytes is needed.

TLDR bip32 ExtKey serialization from https://en.bitcoin.it/wiki/BIP_0032

Extended public and private keys are serialized as follows:

  • 4 byte: version bytes (mainnet: 0x0488B21E public, 0x0488ADE4 private; testnet: 0x043587CF public, 0x04358394 private)
  • 1 byte: depth: 0x00 for master nodes, 0x01 for level-1 derived keys, ....
  • 4 bytes: the fingerprint of the parent's key (0x00000000 if master key)
  • 4 bytes: child number. This is ser32(i) for i in xi = xpar/i, with xi the key being serialized. (0x00000000 if master key)
  • 32 bytes: the chain code
  • 33 bytes: the public key or private key data (serP(K) for public keys, 0x00 || ser256(k) for private keys)

This 78 byte structure can be encoded like other Bitcoin data in Base58, by first adding 32 checksum bits (derived from the double SHA-256 checksum), and then converting to the Base58 representation. This results in a Base58-encoded string of up to 112 characters. Because of the choice of the version bytes, the Base58 representation will start with "xprv" or "xpub" on mainnet, "tprv" or "tpub" on testnet.

My idea was to create multiple custom mutators, each "helping the target" a bit more, to maximize the range tested. (e.g. Mut1 - encode random bytes to b58, Mut2 - hardcode the first 4 version bytes + Mut1, Mut3 - fill to 78 bytes + Mut2, Mut4 - append correct checksum + Mut3, ...)

I see 2 issues with this approach:

1. Base58 utils access

For these mutations, base58 functions are needed, so far i only added a placeholder mutator that creates seemingly b58 looking, 78B long data. I see 3 solutions:

  • Custom implementation in main.cpp (similarly to bitcoin_double_sha256() in main.cpp)
  • "Borrow" it from Core "modules/bitcoin/base58.h"
  • Add the base58 utils to "modules/custommutator/" (similarly to how bech32 and sha256 are there) <- optimal?

2. How mutators work

Please correct me if i am wrong:

As mutators are selected in the building process, running the fuzzer means that only an explicit set of them can be active for that running instance at a time.... Does that mean that if we e.g. want to fuzz "long-term", multiple instances of bitcoinfuzz need to be running to cover all targets? (each with a specified mutator?) Is this something to avoid (shall I reavaluate the multi-mutator solution and instead e.g. implement multiple targets that begin with mutating the input?

Thanks.

kuliq23 avatar Oct 24 '25 15:10 kuliq23

Add the base58 utils to "modules/custommutator/" (similarly to how bech32 and sha256 are there) <- optimal?

Yes, it's the best way to go.

Does that mean that if we e.g. want to fuzz "long-term", multiple instances of bitcoinfuzz need to be running to cover all targets? (each with a specified mutator?)

Yes, it's very common to have different binaries for fuzzing, since when fuzzing we normally compile it with the intent of running a specific target. Per example, if I am fuzzing the target deserialize_invoice doesn't make too much sense to compile the final binary with the rust-bitcoin and bitcoin core.

Is this something to avoid (shall I reavaluate the multi-mutator solution and instead e.g. implement multiple targets that begin with mutating the input?

In this case that we can reuse the custom mutator for more than one target (xpub and xpriv). We could have the same approach as the:

https://github.com/bitcoinfuzz/bitcoinfuzz/blob/v2/main.cpp#L245

Where we basically change the custom mutator behavior based in a runtime flag. const char* mt_env = std::getenv("P2P_LIGHTNING_MESSAGE_TYPE"); .

I think the custom mutator for those extended key targets should have these steps:

  1. Read environment flag to decide xpub vs xpriv
  2. Decode base58 (remove checksum) -> raw 78 bytes
  3. Mutate the 78-byte structure with LibFuzzer
  4. Force the correct version bytes (first 4 bytes) based on key type (after mutation ensure that it will have the correct prefix)
  5. Encode back to base58
  6. Recalculate and add checksum (double SHA-256, first 4 bytes)

erickcestari avatar Oct 27 '25 14:10 erickcestari

Excellent, I'll get to work!

kuliq23 avatar Oct 29 '25 15:10 kuliq23

  1. Decode base58 (remove checksum) -> raw 78 bytes

Does this mean that the initial fuzz_data that the mutator receives is a (valid) key encoded in base58? I thought the mutator receives random bytes and works with them... Is this to do with the "corpus"? (Does the corpus provide an initial set of correct values?)

  1. Encode back to base58
  2. Recalculate and add checksum (double SHA-256, first 4 bytes)

nit: I believe these steps should be switched to comply with the scheme.

kuliq23 avatar Nov 02 '25 11:11 kuliq23

Does this mean that the initial fuzz_data that the mutator receives is a (valid) key encoded in base58? I thought the mutator receives random bytes and works with them... Is this to do with the "corpus"? (Does the corpus provide an initial set of correct values?)

The fuzz_data depends on the fuzzer's state: initially when there's no corpus, the fuzzer generates random bytes, but once a corpus exists, it uses inputs from the corpus which are valid BIP32-formatted keys (base58-encoded with checksums, since our custom mutator ensures this format). In both cases, the mutator attempts to decode the input to raw bytes, and if decoding fails (e.g., for random initial bytes), we either return early or fall back to a minimal valid input (see the code here). Once we have raw bytes, the mutator modifies them and then re-serializes the result into proper BIP32 format with a valid checksum, ensuring that even mutations of random data eventually produce structurally valid keys for the target function to process.

  1. Encode back to base58
  2. Recalculate and add checksum (double SHA-256, first 4 bytes)

nit: I believe these steps should be switched to comply with the scheme.

Yes, true

erickcestari avatar Nov 02 '25 12:11 erickcestari

nit: Would it be better to follow the ...extended_key... naming instead of xprv and xpub? I guess also tprv and tpub could be tested. So, e.g.:

  • BIP32DeserializeKeyXpubXprvTarget -> BIP32DeserializeExtendedKeyTarget
  • bip32_deserialize_key_xpub_xprv -> bip32_deserialize_extended_key

quapka avatar Nov 07 '25 17:11 quapka

@kuliq23, please, keep an eye on https://github.com/bitcoinfuzz/bitcoinfuzz/pull/336. If that gets merged before this one, you'll likely need to move your mutator-related files around as well.

quapka avatar Nov 25 '25 09:11 quapka

Ready for review

  • Updated so that deserialization target does not depend on the serialization functionality of each module and returns a string of this format:
$"depth={depthHex};fp={fingerprintHex};child={childHex};chaincode={chainHex};key={keyHex}"
----
depth=00;fp=00000000;child=00000000;chaincode=0000000000000000000000000000000000000000000000000000000000000000;key=0000000000000000000000000000000000000000000000000000000000000000
  • Removed BTCD, as NewExtendedKey() performes no checks and should not be used.

  • Moved CUSTOMMUTATOR_EXTENDED_KEY to custommutator/mutators/extended_key.h

  • FIND: NBitcoin deserializes extkeys that RustBitcoin and BitcoinCore refuse, namely e.g.:

tpubD6NzVbkrk32yV5x4sxuLgohVBbYnknoUqt4KZPF6s2LJpAJA1e3gTbYp4pzHs1QWWsuUdWiAYzmvMPE9YohUybXvF3ur46sW2quurdpqgS5

xpub662eSZPVhg7X1bDRQ1W3nSCz9e3qsT2RDMKMAh5izNfwMPqquLs2mLMjw4tw5ZuUMUNfdwzF4S9fpP857tQnBCxXTRoTcFnyNTunw7kdqTS

tprv8ZgxMBicTyKp8ejpC3WEY9jnMR8ddW8igEyibLiqdvYDMrC7Sp2D7enWsSFhV3s9X1nxtPsXR3V2fXfe9NfmRRrMs7WMER5Trx3PKs4Ekas

xprv9s21bJh6f2qNHs99bFx5N2guoY8kKZaeUfpddvVBG1fwdRQNTRzEr4ABu9TKHGgZ8YXD7CCQPp8YSBMjDRBstw2dLtuN9sPRdZMh7ReMcCv

I will investigate this further.

  • Bitcoin Core target could be refactored into a more polished version, I'm having trouble with includes in module.h, this does not block the merge

kuliq23 avatar Dec 01 '25 18:12 kuliq23

I'm going to review it soon. Have you investigate your find? Also, be cautious on publicly posting any findings due to security concerns.

The conflicts that must be resolved is because we added a code formatter check recently, sorry for the noise.

brunoerg avatar Dec 09 '25 18:12 brunoerg

Needs rebase

brunoerg avatar Dec 11 '25 10:12 brunoerg