fuzzilli Corpus of master and workers are not synchronized

For some reason, the corpus of the workers and master are not synchronized

On one worker:

➜  corpus grep -r "\-0" .
./program_20210212125318_18DF92D1-1B7E-4B1E-97BD-53DE0883C10C.js:        const v22 = Object.is(-0.0,v18);
./program_20210212125128_36F54B2D-9E6E-4558-B47A-398B02BDB185.js:        const v23 = Object.is(-0.0,-4294967296);

On master:

➜  corpus grep -r "\-0" .
./program_20210212054656_EA9ACAA1-A5F1-4E8D-9034-2FE4B4346A33.js:        const v32 = -0;

Is this a problem or am I missing something here? I am still investigating this because it might have causes the Coverage stat differ on the master and workers.

Feb 12 '21 05:02 ducphanduyagentp

Right, so basically an instance will only import a program from another one (master or worker) if the sample also triggers new converage on the importing instance. Due to that you will naturally have somewhat different corpuses on different instances. So unless the corpuses are entirely different, I think what you are observing should be normal.

Feb 12 '21 09:02 saelo

Thank you for clarifying. At the moment, what I am also observing is that coverage on the master instance is less than some of the workers. Reading the NetworkSync module, what I understand is that network master does not allow dropout when it receives a program from workers while the workers do, so it makes sense if corpus slightly differs across workers. But should the coverage on master always the greatest and the master corpus should produce that coverage?

Feb 13 '21 04:02 ducphanduyagentp

So by default, dropout is disabled, but you can explicitly enable it, and then it will only affect imports on workers, as you said. What you are observing is probably the result of non-deterministic samples: a worker finds a sample that triggers new coverage, but does so non-deterministically. It then sends that program to the master, where it doesn't trigger the same coverage again and thus is discarded. As a result, the worker now has more discovered edges (== higher coverage) than the master. This mechanism acts as a form of "natural", implicit filter against non-deterministic programs in the corpus. An explicit filter for that is planned, too.

Feb 15 '21 08:02 saelo

Another behavior I've noticed is that after shutting down the master and restarting with the --resume option, its corpus has fewer samples than before. Is it also caused by non-deterministic samples and is normal, or should we do something to keep the most samples possible?

Feb 16 '21 02:02 ducphanduyagentp

Yeah this is probably due to two effects: (1) non-deterministic programs being removed and (2) programs whose coverage is a superset of other programs replacing those. Although I haven't investigated this thoroughly, so there might be other factors as well. In any case, once we have a proper --deterministic mode implemented, we should re-evaluate this.

Feb 17 '21 08:02 saelo