[presubmit] Ban committing of seed corpora.
We can't allow this or we will slow down every clone of OSS-Fuzz (bad for CIFuzz).
WDYT of this change? I think it will keep the repo slimmer and force users into a more ideal integration.
WDYT of this change? I think it will keep the repo slimmer and force users into a more ideal integration.
Not sure how much space it currently takes, but I think this is definitely necessary soon or later, since we are accepting more projects. One concern about its impact on fuzzing efficiency: Would it significantly slow down path discovery, because we don't have the seeds for paths with tight conditions?
I recall that some fuzzers (e.g., libFuzzer) can minimise corpus, maybe we could (in the short-term):
- Ask owners to use that on their corpus before committing;
- Enforce a CI check to verify if we can further minimise the committed corpus, and fail if we can.
The problem with this approach is that we will still add some seeds for trivial paths. In the long-term, maybe we could:
- Store all known inputs (of the project) in a separate cloud bucket, don't download them while cloning;
- Use CI check to find which seeds they committed can find new paths that the known inputs cannot (via minimisation, etc.);
- Only add those new seeds to all known inputs, and tell the project owner which inputs are added;
- Remove the whole newly committed seed corpus and notify the owner.
Step 2 may take a while, but the owner should be able to come back and check.
WDYT of this change? I think it will keep the repo slimmer and force users into a more ideal integration.
Not sure how much space it currently takes, but I think this is definitely necessary soon or later, since we are accepting more projects. One concern about its impact on fuzzing efficiency: Would it significantly slow down path discovery, because we don't have the seeds for paths with tight conditions?
O the title is misleading. Seed corpuses are great. I just don't want them in this repo. They don't cause problems if they are elsewhere.
WDYT of this change? I think it will keep the repo slimmer and force users into a more ideal integration.
Not sure how much space it currently takes, but I think this is definitely necessary soon or later, since we are accepting more projects. One concern about its impact on fuzzing efficiency: Would it significantly slow down path discovery, because we don't have the seeds for paths with tight conditions?
O the title is misleading. Seed corpuses are great. I just don't want them in this repo. They don't cause problems if they are elsewhere.
Ah I see, sorry. How do owners submit seeds that they would like to use? (e.g., to ensure some paths will be tested).
WDYT of this change? I think it will keep the repo slimmer and force users into a more ideal integration.
Not sure how much space it currently takes, but I think this is definitely necessary soon or later, since we are accepting more projects. One concern about its impact on fuzzing efficiency: Would it significantly slow down path discovery, because we don't have the seeds for paths with tight conditions?
O the title is misleading. Seed corpuses are great. I just don't want them in this repo. They don't cause problems if they are elsewhere.
Ah I see, sorry. How do owners submit seeds that they would like to use? (e.g., to ensure some paths will be tested).
Store them someplace else and download them in the dockerfile. See skia's integration for an example
Would it make sense to make this a more general "filesize > N kb" check? E.g. I noticed https://github.com/google/oss-fuzz/tree/master/projects/woff2/corpus has a non-zip corpus with one file being ~600kb.
Would it make sense to make this a more general "filesize > N kb" check? E.g. I noticed https://github.com/google/oss-fuzz/tree/master/projects/woff2/corpus has a non-zip corpus with one file being ~600kb.
It would make sense. Let me do this too.
/gcbrun trial_build.py skcms --sanitizer address memory undefined coverage --fuzzing-engine libfuzzer afl honggfuzz
/gcbrun skip