fuzzbench add local stop-experiment logic (by renaming docker containers)

Description

This PR attaches instance_name for dispatcher and runners docker containers, d-<experiment_name> and r-<experiment>-<trial_id>. And then provide fuzzbench-wide and experiment-wide docker cleaner option in make:

make stop-trials [EXPERIMENT=<experiment_name>]: stop trials
make stop-experiment [EXPERIMENT=<experiment_name>]: stop trials and then dispatcher container. When experiment is set, then only clean that experiment-related containers.

We use --volumes --force to clean volume at the same time with force.

Also add some initial logic for local stopping experiment.

Motivation and Context

Related to #470 and #479: helpful options for local users and preparations for end-to-end local tests.

How Has This Been Tested?

I started up two experiments and killed them after their trial runners started. Then I tried to stop one experiment-related docker containers and then tried the fuzzbench-wide clean. Both work.

Other

Unit tests fix to comply with the new format of runners' docker startup scripts.

Jun 22 '20 23:06 zchcai

As suggested above, one command as make stop-trials may be cleaner. If experiment is set, then we remove all related to that experiment only. Otherwise, remove all trials.

Also, use docker rm --force directly, instead two steps of stop and rm.

Since there will also be dispatcher-container running with one experiment, another command make stop-experiment will do make stop-trials first and clean the related dispatcher-container or all.

For measurers, currently it is a process under dispatcher-container. Let's update or separate that logic later when measurers have separate implementation.

Jun 23 '20 14:06 zchcai

I'm wondering it it'd make sense to have a dedicated utility - rather than make - for updating/controlling running experiments/tasks, instead of cluttering the Makefile.

Yes, I'm hoping that this would get much cleaner when we switch to the queue based architecture. In that case each task will be a single docker run (or docker build) command, each worker would work on one task at a time, and we wouldn't even need to run background jobs.

I agree. To that time, we should have a way to show current jobs running status. Also, we need to provide a way for pause an experiment and resume again, also adding or removing workers for running jobs.

To that time, the way bymake * to control the experiment status should consider to update then.

Jun 23 '20 16:06 zchcai

I think part of the issue here is that the scheduler can start docker containers and GCE instances but can only delete GCE instances and not containers. Teaching gcloud.delete_instances to stop containers would work fix this problem and make stop_experiment work properly.

Also, we need to provide a way for pause an experiment and resume again, also adding or removing workers for running jobs

There's no way to pause a trial so other than restarting parts of an experiment (e.g. the measurer) this is not possible.

What happens when someone types in make without specifying targets? Does it just build the all target or does it build all of them? Because some of our targets like clear-cache would produce weird results if "built" in this scenario.

Jun 23 '20 19:06 jonathanmetzman

There's no way to pause a trial so other than restarting parts of an experiment (e.g. the measurer) this is not possible.

Could you give some details why this is not possible?

Jun 23 '20 19:06 zchcai

There's no way to pause a trial so other than restarting parts of an experiment (e.g. the measurer) this is not possible.

Could you give some details why this is not possible?

It actually might be somewhat doable using docker pause and unpause. But I think there's little guarantee this works cleanly since fuzzers could be using things like timers that will be incorrect if unpaused same with our tasks in the queue. I could be wrong but I would guess this feature is very complicated to get right.

Jun 23 '20 20:06 jonathanmetzman

Hi, anyone has some ideas on this weird behavior?

~/fuzzbench$ docker ps -f "name=r-*-[0-9]+$"
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
~/fuzzbench$ docker ps -f "name=-*-[0-9]+$"
CONTAINER ID        IMAGE                                                           COMMAND                  CREATED             STATUS              PORTS               NAMES
751a7a887381        gcr.io/fuzzbench/runners/fairfuzz/zlib_zlib_uncompress_fuzzer   "/bin/sh -c $ROOT_DI…"   14 minutes ago      Up 13 minutes                           r-workingsee-22
34ea57f1418b        gcr.io/fuzzbench/runners/aflfast/zlib_zlib_uncompress_fuzzer    "/bin/sh -c $ROOT_DI…"   14 minutes ago      Up 13 minutes                           r-workingsee-17
7e114bc325cf        gcr.io/fuzzbench/runners/aflfast/jsoncpp_jsoncpp_fuzzer         "/bin/sh -c $ROOT_DI…"   14 minutes ago      Up 13 minutes                           r-workingsee-19
c9c54f0359f5        gcr.io/fuzzbench/runners/fairfuzz/jsoncpp_jsoncpp_fuzzer        "/bin/sh -c $ROOT_DI…"   14 minutes ago      Up 13 minutes                           r-workingsee-23
878d472e46c5        gcr.io/fuzzbench/runners/fairfuzz/zlib_zlib_uncompress_fuzzer   "/bin/sh -c $ROOT_DI…"   14 minutes ago      Up 13 minutes                           r-workingsee-21
88d1ef35e6e5        gcr.io/fuzzbench/runners/aflfast/jsoncpp_jsoncpp_fuzzer         "/bin/sh -c $ROOT_DI…"   14 minutes ago      Up 13 minutes                           r-workingsee-20
522c0b1258ef        gcr.io/fuzzbench/runners/fairfuzz/jsoncpp_jsoncpp_fuzzer        "/bin/sh -c $ROOT_DI…"   14 minutes ago      Up 13 minutes                           r-workingsee-24
785f34788e97        gcr.io/fuzzbench/runners/aflfast/zlib_zlib_uncompress_fuzzer    "/bin/sh -c $ROOT_DI…"   14 minutes ago      Up 13 minutes                           r-workingsee-18

Previous when we use fuzzbench as prefix, there is no such misbehavior.

Jun 24 '20 01:06 zchcai