lightning icon indicating copy to clipboard operation
lightning copied to clipboard

fuzz-tests: Add a test for the gossipd-connectd interface

Open Chand-ra opened this issue 5 months ago • 11 comments

Changelog-None: connectd_req() in gossipd/gossipd.c is responsible for handling gossip messages from peers handed to it by connectd. Add a stateful test simulating its behaviour.

Checklist

Before submitting the PR, ensure the following tasks are completed. If an item is not applicable to your PR, please mark it as checked:

  • [x] The changelog has been updated in the relevant commit(s) according to the guidelines.
  • [x] Tests have been added or modified to reflect the changes.
  • [x] Documentation has been reviewed and updated as needed.
  • [x] Related issues have been listed and linked, including any that this PR closes.

CC: @morehouse

Chand-ra avatar Jul 22 '25 12:07 Chand-ra

The test results in the following LeakSanitizer error when run on its corpus:

==116428==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 40 byte(s) in 1 object(s) allocated from:
    #0 0x619eb51001f3 in malloc (/home/chandra/lightning/tests/fuzz/fuzz-gossipd-connectd+0x2da1f3) (BuildId: 34398486c14ff41d79a297932e621cb4f6161418)
    #1 0x619eb54deb3d in split_node /home/chandra/lightning/ccan/ccan/intmap/intmap.c:67:9
    #2 0x619eb54de8c2 in intmap_add_ /home/chandra/lightning/ccan/ccan/intmap/intmap.c:113:9
    #3 0x619eb53122d1 in map_add /home/chandra/lightning/gossipd/gossmap_manage.c:176:6
    #4 0x619eb530f8c6 in gossmap_manage_channel_announcement /home/chandra/lightning/gossipd/gossmap_manage.c:640:8
    #5 0x619eb53b2101 in handle_recv_gossip /home/chandra/lightning/tests/fuzz/../../gossipd/gossipd.c:205:12
    #6 0x619eb53a93b2 in run /home/chandra/lightning/tests/fuzz/fuzz-gossipd-connectd.c:543:4
    #7 0x619eb513e8f8 in LLVMFuzzerTestOneInput /home/chandra/lightning/tests/fuzz/libfuzz.c:25:2
    #8 0x619eb504c0c4 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) (/home/chandra/lightning/tests/fuzz/fuzz-gossipd-connectd+0x2260c4) (BuildId: 34398486c14ff41d79a297932e621cb4f6161418)
    #9 0x619eb504b7b9 in fuzzer::Fuzzer::RunOne(unsigned char const*, unsigned long, bool, fuzzer::InputInfo*, bool, bool*) (/home/chandra/lightning/tests/fuzz/fuzz-gossipd-connectd+0x2257b9) (BuildId: 34398486c14ff41d79a297932e621cb4f6161418)
    #10 0x619eb504d3d6 in fuzzer::Fuzzer::ReadAndExecuteSeedCorpora(std::vector<fuzzer::SizedFile, std::allocator<fuzzer::SizedFile>>&) (/home/chandra/lightning/tests/fuzz/fuzz-gossipd-connectd+0x2273d6) (BuildId: 34398486c14ff41d79a297932e621cb4f6161418)
    #11 0x619eb504d8e7 in fuzzer::Fuzzer::Loop(std::vector<fuzzer::SizedFile, std::allocator<fuzzer::SizedFile>>&) (/home/chandra/lightning/tests/fuzz/fuzz-gossipd-connectd+0x2278e7) (BuildId: 34398486c14ff41d79a297932e621cb4f6161418)
    #12 0x619eb503addf in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) (/home/chandra/lightning/tests/fuzz/fuzz-gossipd-connectd+0x214ddf) (BuildId: 34398486c14ff41d79a297932e621cb4f6161418)
    #13 0x619eb5065466 in main (/home/chandra/lightning/tests/fuzz/fuzz-gossipd-connectd+0x23f466) (BuildId: 34398486c14ff41d79a297932e621cb4f6161418)
    #14 0x7aed7c22a1c9 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #15 0x7aed7c22a28a in __libc_start_main csu/../csu/libc-start.c:360:3
    #16 0x619eb502fdc4 in _start (/home/chandra/lightning/tests/fuzz/fuzz-gossipd-connectd+0x209dc4) (BuildId: 34398486c14ff41d79a297932e621cb4f6161418)

SUMMARY: AddressSanitizer: 40 byte(s) leaked in 1 allocation(s).

INFO: a leak has been found in the initial corpus.

Chand-ra avatar Jul 22 '25 12:07 Chand-ra

The test results in the following LeakSanitizer error when run on its corpus:

I looked closer at this. The accused line of code does a naked malloc (no tal use), which makes me suspect there could be a cleanup problem on each iteration of the fuzz target, since only tal-allocated memory gets freed.

We might need to manually delete all elements from the gossmap at the end of each iteration.

morehouse avatar Jul 28 '25 21:07 morehouse

We might need to manually delete all elements from the gossmap at the end of each iteration.

This makes the target a bit messier but does seem to manage to evade the issue at hand.

Chand-ra avatar Jul 29 '25 10:07 Chand-ra

I was fixing some of the other issues with this target when it ran into yet another memory leak:

==31969==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 40 byte(s) in 1 object(s) allocated from:
    #0 0x57fcd5d431f3 in malloc (/home/chandra/lightning/tests/fuzz/fuzz-gossipd-connectd+0x2da1f3) (BuildId: 844798d4ffd40334b8c2f0d73bff69c289ab8025)
    #1 0x57fcd612359d in split_node /home/chandra/lightning/ccan/ccan/intmap/intmap.c:67:9
    #2 0x57fcd6123322 in intmap_add_ /home/chandra/lightning/ccan/ccan/intmap/intmap.c:113:9
    #3 0x57fcd5f52a16 in add_unknown_scid /home/chandra/lightning/gossipd/seeker.c:695:7
    #4 0x57fcd5f52830 in query_unknown_channel /home/chandra/lightning/gossipd/seeker.c:1159:2
    #5 0x57fcd5fd8f95 in process_channel_update /home/chandra/lightning/tests/fuzz/../../gossipd/gossmap_manage.c:806:3
    #6 0x57fcd5fd6e3d in gossmap_manage_channel_update /home/chandra/lightning/tests/fuzz/../../gossipd/gossmap_manage.c:983:9
    #7 0x57fcd5ff04be in handle_recv_gossip /home/chandra/lightning/tests/fuzz/../../gossipd/gossipd.c:210:12
    #8 0x57fcd5fe708c in run /home/chandra/lightning/tests/fuzz/fuzz-gossipd-connectd.c:515:4
    #9 0x57fcd5d818f8 in LLVMFuzzerTestOneInput /home/chandra/lightning/tests/fuzz/libfuzz.c:25:2
    #10 0x57fcd5c8f0c4 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) (/home/chandra/lightning/tests/fuzz/fuzz-gossipd-connectd+0x2260c4) (BuildId: 844798d4ffd40334b8c2f0d73bff69c289ab8025)
...
...
...

This time, the leak occurs when we try to send a channel_update message for a channel that the peer hasn't already received a channel_announcement for. CLN asks the seeker to query the sending peer for a channel_announcement with query_unknown_channel() which then tried to add the unknown peer to seeker->unknown_scids using uintmap_add() which causes the leak.

Chand-ra avatar Jul 30 '25 10:07 Chand-ra

This time, the leak occurs when we try to send a channel_update message for a channel that the peer hasn't already received a channel_announcement for. CLN asks the seeker to query the sending peer for a channel_announcement with query_unknown_channel() which then tried to add the unknown peer to seeker->unknown_scids using uintmap_add() which causes the leak.

Then we also need to manually free that map at the end of each iteration.

morehouse avatar Jul 30 '25 15:07 morehouse

Have you been able to run this fuzz target? I tried and hit a bunch of crashes immediately.

Only reproduces when I use multiple workers, and seems multiple workers are trying to touch the same file (yikes).

Yeah, we discussed this over our meeting a while ago, and we decided upon creating a new file for each fuzz run. I tried to do that but the file name needs to be statically defined using #define GOSSIP_STORE_FILENAME otherwise the rename() here starts complaining.

The next best thing I could muster up was resetting the gossip_store file with unlink(GOSSIP_STORE_FILENAME) at the start of each run.

Chand-ra avatar Aug 06 '25 11:08 Chand-ra

Yeah, we discussed this over our meeting a while ago, and we decided upon creating a new file for each fuzz run. I tried to do that but the file name needs to be statically defined using #define GOSSIP_STORE_FILENAME otherwise the rename() here starts complaining.

Ugh, that's tricky. It kind of seems like this fuzz target would do better with an out-of-process fuzzing engine.

We can try to hack around the issue though. Maybe we could redefine GOSSIP_STORE_FILENAME to be some function call get_gossip_store_filename(), which then returns a thread-local temp file name.

morehouse avatar Aug 06 '25 21:08 morehouse

We can try to hack around the issue though. Maybe we could redefine GOSSIP_STORE_FILENAME to be some function call get_gossip_store_filename(), which then returns a thread-local temp file name.

In C, only adjacent string literals are concatenated at compile time, so the GOSSIP_STORE_FILENAME has to be a string literal. I tried using a function as a substitute but get a compile error at the expansion:

rename(get_gossip_store_filename(), get_gossip_store_filename() ".corrupt")

Chand-ra avatar Aug 08 '25 13:08 Chand-ra

In C, only adjacent string literals are concatenated at compile time, so the GOSSIP_STORE_FILENAME has to be a string literal. I tried using a function as a substitute but get a compile error at the expansion:

For now we could just change that location to use tal_strcat. This would allow the fuzz target to actually work, so we can run it and see if we find any bugs. If the target finds bugs, then it's probably worth changing that location permanently so this we can use this fuzz target can serve as a regression test.

morehouse avatar Aug 08 '25 21:08 morehouse

For now we could just change that location to use tal_strcat. This would allow the fuzz target to actually work, so we can run it and see if we find any bugs. If the target finds bugs, then it's probably worth changing that location permanently so this we can use this fuzz target can serve as a regression test.

I ran the target for quite some time and it doesn't seem to be able to discover any newer vulnerabilities.

Chand-ra avatar Aug 09 '25 09:08 Chand-ra

I'm deferring this one, as it will need significant rework for the rebase. In particular, we are going to have a way of overriding entropy, so this should use that.

rustyrussell avatar Nov 10 '25 05:11 rustyrussell