add local stop-experiment logic (by renaming docker containers)
Description
This PR attaches instance_name for dispatcher and runners docker containers, d-<experiment_name> and r-<experiment>-<trial_id>. And then provide fuzzbench-wide and experiment-wide docker cleaner option in make:
make stop-trials [EXPERIMENT=<experiment_name>]: stop trialsmake stop-experiment [EXPERIMENT=<experiment_name>]: stop trials and then dispatcher container. Whenexperimentis set, then only clean that experiment-related containers.
We use --volumes --force to clean volume at the same time with force.
Also add some initial logic for local stopping experiment.
Motivation and Context
Related to #470 and #479: helpful options for local users and preparations for end-to-end local tests.
How Has This Been Tested?
I started up two experiments and killed them after their trial runners started. Then I tried to stop one experiment-related docker containers and then tried the fuzzbench-wide clean. Both work.
Other
Unit tests fix to comply with the new format of runners' docker startup scripts.
As suggested above, one command as make stop-trials may be cleaner. If experiment is set, then we remove all related to that experiment only. Otherwise, remove all trials.
Also, use docker rm --force directly, instead two steps of stop and rm.
Since there will also be dispatcher-container running with one experiment, another command make stop-experiment will do make stop-trials first and clean the related dispatcher-container or all.
For measurers, currently it is a process under dispatcher-container. Let's update or separate that logic later when measurers have separate implementation.
I'm wondering it it'd make sense to have a dedicated utility - rather than make - for updating/controlling running experiments/tasks, instead of cluttering the Makefile.
Yes, I'm hoping that this would get much cleaner when we switch to the queue based architecture. In that case each task will be a single
docker run(ordocker build) command, each worker would work on one task at a time, and we wouldn't even need to run background jobs.
I agree. To that time, we should have a way to show current jobs running status. Also, we need to provide a way for pause an experiment and resume again, also adding or removing workers for running jobs.
To that time, the way bymake * to control the experiment status should consider to update then.
I think part of the issue here is that the scheduler can start docker containers and GCE instances but can only delete GCE instances and not containers. Teaching gcloud.delete_instances to stop containers would work fix this problem and make stop_experiment work properly.
Also, we need to provide a way for pause an experiment and resume again, also adding or removing workers for running jobs
There's no way to pause a trial so other than restarting parts of an experiment (e.g. the measurer) this is not possible.
What happens when someone types in make without specifying targets? Does it just build the all target or does it build all of them? Because some of our targets like clear-cache would produce weird results if "built" in this scenario.
There's no way to pause a trial so other than restarting parts of an experiment (e.g. the measurer) this is not possible.
Could you give some details why this is not possible?
There's no way to pause a trial so other than restarting parts of an experiment (e.g. the measurer) this is not possible.
Could you give some details why this is not possible?
It actually might be somewhat doable using docker pause and unpause. But I think there's little guarantee this works cleanly since fuzzers could be using things like timers that will be incorrect if unpaused same with our tasks in the queue. I could be wrong but I would guess this feature is very complicated to get right.
Hi, anyone has some ideas on this weird behavior?
~/fuzzbench$ docker ps -f "name=r-*-[0-9]+$"
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
~/fuzzbench$ docker ps -f "name=-*-[0-9]+$"
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
751a7a887381 gcr.io/fuzzbench/runners/fairfuzz/zlib_zlib_uncompress_fuzzer "/bin/sh -c $ROOT_DI…" 14 minutes ago Up 13 minutes r-workingsee-22
34ea57f1418b gcr.io/fuzzbench/runners/aflfast/zlib_zlib_uncompress_fuzzer "/bin/sh -c $ROOT_DI…" 14 minutes ago Up 13 minutes r-workingsee-17
7e114bc325cf gcr.io/fuzzbench/runners/aflfast/jsoncpp_jsoncpp_fuzzer "/bin/sh -c $ROOT_DI…" 14 minutes ago Up 13 minutes r-workingsee-19
c9c54f0359f5 gcr.io/fuzzbench/runners/fairfuzz/jsoncpp_jsoncpp_fuzzer "/bin/sh -c $ROOT_DI…" 14 minutes ago Up 13 minutes r-workingsee-23
878d472e46c5 gcr.io/fuzzbench/runners/fairfuzz/zlib_zlib_uncompress_fuzzer "/bin/sh -c $ROOT_DI…" 14 minutes ago Up 13 minutes r-workingsee-21
88d1ef35e6e5 gcr.io/fuzzbench/runners/aflfast/jsoncpp_jsoncpp_fuzzer "/bin/sh -c $ROOT_DI…" 14 minutes ago Up 13 minutes r-workingsee-20
522c0b1258ef gcr.io/fuzzbench/runners/fairfuzz/jsoncpp_jsoncpp_fuzzer "/bin/sh -c $ROOT_DI…" 14 minutes ago Up 13 minutes r-workingsee-24
785f34788e97 gcr.io/fuzzbench/runners/aflfast/zlib_zlib_uncompress_fuzzer "/bin/sh -c $ROOT_DI…" 14 minutes ago Up 13 minutes r-workingsee-18
Previous when we use fuzzbench as prefix, there is no such misbehavior.