Differential fuzzing of BIP32 key derivations
Hi all, thanks for the interesting project. During Spring, I've stumbled upon a serialization problem of extended keys of depth over 255 (thus not fitting into a single byte and overflowing). The problem occurred in a number of libraries, see https://github.com/bitcoin/bitcoin/issues/32201. Based on this finding I got interested in doing differential fuzzing of BIP32 derivations across wide-range of targets (various languages). Also, from what I've seen the key derivation API is broad-ish, leading to errors like https://github.com/MetacoSA/NBitcoin/issues/1259. This motivates closer look and fuzz test the different key derivation API options.
Currently, together with a student of mine, we are working on a proof of concept fuzzer, but I would like to hear your opinion on whether extending bitcoinfuzz would seem like a good fit for you. Our goal is to support wide range of target implementations (from various languages), not just the ones currently present in your project. Possibly, including also open-source hardware wallets firmwares, etc.
Hi, @quapka. It looks interesting!
I think we could have a target here in bitcoinfuzz for BIP32 key derivations. I don't think you would need to spend time creating another project just for this, seems fine and not hard to write this target here. I think the easiest path is to start by creating the target with the existing modules we have here in bitcoinfuzz (e.g. Bitcoin Core, NBitcoin, rust-bitcoin). Then, we could work on integrating other relevant projects here (this usually takes more time) - I'm really interested on having bitcoinj, libbitcoin (which there is an open PR integrating it), and others, here. If you or your student want to work on it here, I'm totally open to help and guide you. If not, I will probably do it but not in short-time since we have many things on our backlog and we're basically no more than 3 people working on this project.
Indeed, it sounds better to contribute to an existing project.
I've checked the overall design - could you double check that I got it right?
- basemodule.cpp is where would would supply new fuzzing entry points for the BIP32 keys, e.g.,
extended_pub_key_deserialize, etc. I can imagine there being several such functions - some specific to only certain libraries. - modules/ holds the individual targets. There we could either add a new modules or enhance the existing
module.cppand the actual implementation. I take it that for the build API it's enough ifmakewithin the module createsmodule.athat is then statically linked? - driver.cpp is where the testing happens, e.g. here for script_parse(). The result analysis is pair-wise, it (I wonder) stops the whole process and fails? Are the results not automatically saved anywhere (e.g., in JSON), but just printed out?
- What part of the fuzzer handles that not all libs support all functionality, e.g., LND only has two calls?
And a couple of other questions:
- Currently, I don't see support for static test vectors. Would it be ok to add these? For BIP32, think test vectors from the specifications and other, manually crafted, edge-cases.
- Would you be okay with us adding various range of languages (C/C++, Java, Rust, Go, interpreted ones,..)? There are lots of projects that only support BIP32 derivations.
- I see some custom mutators, I haven't studied it that well, but does
bitcoinfuzzsupport some kind of templates for the fuzzing? - Regarding terminology, you refer to the implementations (e.g., bitcoin-core) as modules and to the fuzzed functions as targets, right?
- The targeted versions of the modules are currently hard-coded, within the modules source, correct? Meaning, you can't test different versions of the same implementation against each other. I have some experience with using Nix, which could help automate building the statically-linked libs (
module.a) in different versions, if you'd be interested. The individual modules' loading seems to be hardcoded so that would need to be updated a bit as well. - [Update] How/where would you handle if a module supports various ways of implementing a given target (with the exact expected result)? I've seen various ways on how to derive a key based on the path.
Feel free to answer in parts.. thanks!
basemodule.cpp is where would would supply new fuzzing entry points for the BIP32 keys, e.g., extended_pub_key_deserialize, etc. I can imagine there being several such functions - some specific to only certain libraries.
Yes, exactly. Here we define the targets that each library(module) may or not implement.
modules/ holds the individual targets. There we could either add a new modules or enhance the existing module.cpp and the actual implementation. I take it that for the build API it's enough if make within the module creates module.a that is then statically linked?
Yes, we only need a module.a static library that implements BaseModule, which will then be linked statically into the final bitcoinfuzz binary.
driver.cpp is where the testing happens, e.g. here for script_parse(). The result analysis is pair-wise, it (I wonder) stops the whole process and fails?
Yes, this causes the program to crash. When it crashes, the fuzzer (libFuzzer or AFL) detects the input that triggered the failure and saves it for us. The prints provide context by showing what happened, whether different outputs were produced, or whether one implementation succeeded while another failed. (note some target doesn't have any print at all)
Are the results not automatically saved anywhere (e.g., in JSON), but just printed out?
No, the fuzzer saves the results automatically as raw input bytes.
What part of the fuzzer handles that not all libs support all functionality, e.g., LND only has two calls?
All modules inherit default BaseModule implementations. Each target calls the relevant module method. If a module doesn’t implement it, the default returns std::nullopt, so the target skips that module’s output. When two or more modules return a value, the driver asserts they match. Note: it is not recommended to run a target with modules that do not implement the corresponding function.
per example:
void Driver::TransactionEvalTarget(std::span<const uint8_t> buffer) const
{
std::optional<std::string> last_response{std::nullopt};
std::string last_module_name;
for (auto &module : modules) // modules chosen via CXXFLAGS
{
std::optional<std::string> res{module.second->transaction_eval(buffer)};
if (!res.has_value()) continue; // default impl -> std::nullopt, skip
if (last_response.has_value()) assert(*res == *last_response);
last_response = res.value();
last_module_name = module.first;
}
}
Note: I am not a mantainer to determine what can be added or not, but these are my thoughts:
- Currently, I don't see support for static test vectors. Would it be ok to add these? For BIP32, think test vectors from the specifications and other, manually crafted, edge-cases.
We don’t hardcode those tests. The purpose of coverage-guided fuzzing is for the fuzzer to “learn” the target function by tracking code coverage and generating new inputs that explore additional paths, which naturally exposes edge cases. Static vectors are more useful as an initial corpus to bootstrap the fuzzer, helping it reach deeper code paths and uncover bugs more quickly.
- Would you be okay with us adding various range of languages (C/C++, Java, Rust, Go, interpreted ones,..)? There are lots of projects that only support BIP32 derivations.
Yes, we already have modules in those languages, so checking how they are built should help you add new ones. We currently have modules in C/C++, Kotlin, Scala, Rust, Go, Python and C#
- I see some custom mutators, I haven't studied it that well, but does
bitcoinfuzzsupport some kind of templates for the fuzzing?
Yes, it will depend of which fuzzer you will use, but currently we only support custom mutators for LibFuzzer for few targets. Those custom mutators will help the fuzzer mutate the input and create a mutated input based in some structure.
See: https://github.com/google/fuzzing/blob/master/docs/structure-aware-fuzzing.md
Per example the CUSTOM_MUTATOR_P2P_MESSAGE mutate the input and then fix the checksum. It's needed because the chance of the LibFuzzer mutates the input and create a valid checksum is almost 0.
size_t LLVMFuzzerCustomMutator(uint8_t *fuzz_data, size_t size, size_t max_size,
unsigned int seed) {
// First, mutate the data using LibFuzzer's default mutator
size_t new_size = LLVMFuzzerMutate(fuzz_data, size, max_size);
// Then fix the checksum if the data looks like it could be a Bitcoin message
fix_bitcoin_checksum(fuzz_data, new_size);
return new_size;
}
- Regarding terminology, you refer to the implementations (e.g., bitcoin-core) as modules and to the fuzzed functions as targets, right?
Yes.
- The targeted versions of the modules are currently hard-coded, within the modules source, correct? Meaning, you can't test different versions of the same implementation against each other. I have some experience with using Nix, which could help automate building the statically-linked libs (
module.a) in different versions, if you'd be interested. The individual modules' loading seems to be hardcoded so that would need to be updated a bit as well.
Yes, that would be really good to have, since right now we can't test different versions of the same implementation against each other.
- [Update] How/where would you handle if a module supports various ways of implementing a given target (with the exact expected result)? I've seen various ways on how to derive a key based on the path.
We can either create separate targets or, within a single target, call multiple methods of deriving the same key and add asserts in the module to verify that the results match. (which I think is better)
Thanks for the confirmation and insights. We discussed that and it makes sense to us to work on adding bip32-related targets to bitcoinfuzz. We plan to start with adding some bip32-dummy targets for Rust bitcoin module and see where we get.
The work has started. Currently, implementing BIP32 derivations to get a better grasp of the project in this fork.
How would you like us to structure the PRs? One PR per (target, module) pair or implement at least two modules at once, so the driver can make comparisons, and then add more incrementally? The latter makes more sense to me.
At this point, we don't have fixed targets, so there's no rush. Btw. in case you yourself would have ideas regarding BIP32 targets (we'll make a list of our own ideas), we are happy to discuss them.
Nice! Thanks for starting this! :)
How would you like us to structure the PRs? One PR per (target, module) pair or implement at least two modules at once, so the driver can make comparisons, and then add more incrementally? The latter makes more sense to me.
I think the second option is a better approach. As you said, we could already start doing differential fuzzing between two different modules.
At this point, we don't have fixed targets, so there's no rush. Btw. in case you yourself would have ideas regarding BIP32 targets (we'll make a list of our own ideas), we are happy to discuss them.
Be free to post the list here and then we can discuss about it.
Hi, student of @quapka here! Thank you for all the useful comments and lots of help. I got a question. Is bitcoinfuzz being run/running somewhere periodically/constantly? It would help to know whether we need to set up a dedicated machine for my project or not. Thanks in advance:)
Hi, student of @quapka here! Thank you for all the useful comments and lots of help. I got a question. Is bitcoinfuzz being run/running somewhere periodically/constantly? It would help to know whether we need to set up a dedicated machine for my project or not. Thanks in advance:)
Yes, I run it constrantly on some personal servers. Also, we hope to get in OSS-Fuzz.
Yes, I run it constrantly on some personal servers.
We will likely run it locally as well, but good to know.
Also, we hope to get in OSS-Fuzz.
Ah, I see that would be nice, indeed!