LibAFL Add Stoppable trait to State which exposes an API to stop the fuzzer

Jun 18 '24 09:06 R9295

I'm not sure about the name. How about should_stop_fuzzing or similar?

Jun 18 '24 09:06 domenukk

Yeah, I wanted to keep it in line with AFL++ but I'll change it

Jun 18 '24 09:06 R9295

clippy complains.

error[E0277]: the trait bound `S: libafl::state::HasShouldStopFuzzing` is not satisfied
   --> src/fuzz.rs:68:8
    |
68  |     F: Fuzzer<E, EM, ST, State = S>,
    |        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the trait `libafl::state::HasShouldStopFuzzing` is not implemented for `S`

Also, should the trait just be called Stoppable or something?

Jun 18 '24 15:06 domenukk

We can do stoppable, but Has... is more inline with other traits.

Jun 18 '24 16:06 R9295

@domenukk I was also thinking of introducing a new Event::Stop for multi-threaded fuzzing. What do you think?

Jun 19 '24 11:06 R9295

And also adding this to load_file & generate_initial_internal so seed loading can be stopped.

Jun 19 '24 11:06 R9295

We can do stoppable, but Has... is more inline with other traits.

I was thinking we can reuse that trait for other parts like the stage. But maybe you're right. I'll leave it up to you, to be aligned either let's do Stoppable and should_stop (no fuzzing) or go back to the Has version.

WRT Event::Stop, right now it works via signals but I can see it won't work for all setups, so yes let's do it.

Jun 20 '24 00:06 domenukk

No let's keep Stoppable then, I had not thought of it's re-use potential.

Jun 21 '24 08:06 R9295

Hmm this is still crashing when using LlmpRestartingEventManager and firing Event::Stop from a Stage.

Fuzzer-respawner: Storing state in crashed fuzzer instance did not work, no point to spawn the next client! This can happen if the child calls `exit()`, in that case make sure it uses `abort()`, if it got killed unrecoverable (OOM), or if there is a bug in the fuzzer itself. (Child exited with: 0)
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Jun 21 '24 12:06 R9295

Usually if you use mgr.on_restart(&mut state)?; in order to store the state or mgr.send_exiting()?; the respawner shouldn't complain. Are you sure you are actually calling send_exiting?

Jun 21 '24 12:06 domenukk

as far as i understand the stop event is not created at all so far, why would you need to ask for stopping the fuzzer from the state? also there is #2334 that is kind of related. once we received a stop signal, we store that we wish to stop fuzzing and we exit at the end of the fuzzing run to avoid state corruption, so that we can dump it on disk. the thing is we cannot access the state struct from the signal handler (with the current implementation at least), so i guess the goal is different?

Jun 21 '24 13:06 rmalmain

as far as i understand the stop event is not created at all so far, why would you need to ask for stopping the fuzzer from the state? also there is #2334 that is kind of related. once we received a stop signal, we store that we wish to stop fuzzing and we exit at the end of the fuzzing run to avoid state corruption, so that we can dump it on disk. the thing is we cannot access the state struct from the signal handler (with the current implementation at least), so i guess the goal is different?

The Stoppable trait would be the thing that stops fuzzing from a signal handler for example, the event is just a bonus to stop opther nodes as well (optionally) So I think you concurrently implemented the same thing? :D

Jun 21 '24 14:06 domenukk

I don't think so, we don't have access to the state from the signal handler. i don't think it is easy to fix, since we could have a signal happening while we are mutating the state, so we cannot use static mut (except if we use synchronization primitives, but we would have an overhead for each access to the state, which is not something we want i believe)

Jun 21 '24 15:06 rmalmain

Ah right, I agree - we definitely need to set the flag outside of the state. Both PRs should be merged somehow

Jun 21 '24 15:06 domenukk

yes, it can be interesting to link this to state. but it should not be mixed with signal handling. so maybe just refactor Restartable to get / set the static mut (we need some synchronization still, but it's limited to the static mut so it's fine. we may need to block signal during the read / write)

Jun 21 '24 15:06 rmalmain

EDIT: clarified with toka.

I'm not sure I understand. What is a signal handler in this context? Do you mean unix signal handler? @rmalmain - can't the EventManager handle the signal and subsequently dump state if it is a RestartingEventManager (by forwarding the libc::exit(CTRL_C))? Why re-implement StateRestorer's job?

For example: https://github.com/AFLplusplus/LibAFL/blob/main/libafl/src/events/llmp/restarting.rs#L597-L603

Jun 23 '24 11:06 R9295

Ok so I've figured out the issue:

Since the inner LlmpEventManager processes all events in RestartingLlmpEventManager, LlmpEventManager calling send_exiting calls LlmpEventManager::send_exiting not RestartingLlmpEventManager::send_exiting.

RestartingLlmpEventManager::send_exiting has the staterestorer.send_exiting call while the normal LlmpEventManager doesn't

This is also an issue with SimpleRestartingEventManager which has to duplicate the process function. I think we have to re-use the logic of libc::exit in fuzz loop of #2334 to accommodate both EventManager and RestartingEventManagers. Like this, users who wish to use restarting managers must save state somehow (hooks, etc) before sending Event::Stop and users who use normal event managers can simply exit.

thoughts? @rmalmain / @domenukk

Jun 23 '24 13:06 R9295

yes, the RestartingEventManager could simply register a hook for that in the inner manager, we can add dynamic hooks (back) for that(?)

Jun 25 '24 08:06 domenukk

What's the status here? CI isn't happy yet

error[E0004]: non-exhaustive patterns: `&&events::Event::Stop` not covered
   --> libafl/src/events/tcp.rs:319:15
    |
319 |         match &event {
    |               ^^^^^^ pattern `&&events::Event::Stop` not covered
    |
note: `events::Event<I>` defined here
   --> libafl/src/events/mod.rs:272:10
    |
272 | pub enum Event<I>
    |          ^^^^^
...
357 |     Stop,
    |     ---- not covered
    = note: the matched value is of type `&&events::Event<I>`
help: ensure that all possible cases are being handled by adding a match arm with a wildcard pattern or an explicit pattern as shown
    |
410 ~             Event::CustomBuf { .. } => Ok(BrokerEventResult::Forward),
411 ~             &&events::Event::Stop => todo!(),
    |

Jun 26 '24 22:06 domenukk

@domenukk I didn't have time. Will get to it today evening

Jun 27 '24 09:06 R9295

@domenukk what I was proposing is even simpler actually:

Currently, from what I understand, LibAFL has no standard way to save and reload state from disk. https://github.com/AFLplusplus/LibAFL/pull/2334 proposes a way to do so.

So, for this PR, instead of returning in the fuzz loop, we can simply send a libc::exit signal, just like https://github.com/AFLplusplus/LibAFL/pull/2334 is proposing. Like this, all managers, whether restarting or non-restarting, will obey it. Restarting managers won't complain that the state is not stored when exiting since libc::exit is a special clause.

People who wish to use Event::Stop must come up with their own way to serialize state when exiting. Be it using the functionality proposed in https://github.com/AFLplusplus/LibAFL/pull/2334 or some custom way through a hook, since hooks are run before and/or after the event in the client so it is guaranteed that we would get to save the state before the exit

This would make it compatible with both RestartingEventManagers and EventManagers as long as both have hooks. What are your thoughts?

We basically need to add hooks for all Managers in that case.

Jun 27 '24 18:06 R9295

Also I'm not sure why the windows build is failing, seems like it's not relevant to the PR

Jun 27 '24 18:06 R9295

And we still need the broker to exit after it's done relaying Event::Stop

Jun 27 '24 19:06 R9295

FWIW, the standard way to save and load state from disk is through Serde(?) You can just do that right before you quit.

If we have a clean way to return early from stages (for example through a should_stop flag), we'd be good? Maybe I'm not thinking something through here. But I think I prefer a clean exit over calling exit directly, especially from a library..

The Launcher already has the option to exit when at least n children have spawned and all children have exited.

Jun 28 '24 07:06 domenukk

Er, why not just return Err(Error::ShuttingDown) from a stage?

Jun 28 '24 10:06 addisoncrump

@domenukk we should be good now

Jul 02 '24 13:07 R9295

Note that I still prefer cancel :P

Jul 02 '24 15:07 domenukk