cppcheck oss-fuzz: Add raw fuzzer

Add fuzzer that passes raw data to cppcheck.check. This fuzzer is heavily dependent on having a good seed corpus, which I have added in the corresponding PR on OSS-Fuzz: https://github.com/google/oss-fuzz/pull/10590

I added this fuzzer because looking at the coverage profile for cppcheck I noticed it's about ~25%, meaning there is room for improvement: https://introspector.oss-fuzz.com/project-profile?project=cppcheck The fuzzer added in this PR together with the corpus generation from the OSS-Fuzz PR locally gets above 50% code coverage on my machine -- likely it will grow much more when run with the CPU power of OSS-Fuzz.

If you're happy with it then once this PR is merged I'll merge https://github.com/google/oss-fuzz/pull/10590

Jun 24 '23 14:06 DavidKorczynski

Can you show some results?

I fear that 99% of bugs exposed will be for passing garbage data to cppcheck. Cppcheck is not very robust for garbage input. I think users should use their compiler to ensure the code is compilable first.

There are lots of garbage test cases here: https://github.com/danmar/cppcheck/blob/main/test/testgarbage.cpp and it does not feel like that is anywhere near completed.

I have the feeling we could spend years of development to make cppcheck more robust for bad input. But it will have no actual benefit for normal users that pass compilable code.

In theory I would like a robust product that can handle garbage input properly.

Jun 25 '23 09:06 danmar

I ran it for around 10 minutes and no issues were found, so no results as such besides the increased coverage. I'd expect things to change when run on OSS-Fuzz.

Would you like me to run it for longer locally and see if something comes up? Alternatively we can enable it to run on OSS-Fuzz and if it's too noisy we can simply remove it again.

Jun 25 '23 19:06 DavidKorczynski

Do I understand it correctly that you suspect there will not be garbage code? Only compilable code will be tested?

I ran it for around 10 minutes and no issues were found

ok. hmm I am surprised. I had the feeling that issues would be found quickly..

Would you like me to run it for longer locally and see if something comes up? Alternatively we can enable it to run on OSS-Fuzz and if it's too noisy we can simply remove it again.

Could you run it for an hour or two please?

If nothing comes up then I think merging to OSS-Fuzz could be done to test it out more thoroughly.

Jun 25 '23 20:06 danmar

Do I understand it correctly that you suspect there will not be garbage code? Only compilable code will be tested?

No, it will produce garbage code. However, the fuzzer will "start" from all of the .cpp code in this repository (building the corpus in the corresponding OSS-Fuzz PR: https://github.com/google/oss-fuzz/pull/10590/files#diff-04f92e08039112ec4b73b0230f89a2b58dc4a00c3d31c414f84bf2863f3f92bdR25-R28) and then continuously mutate over it while exploring more of the cppcheck codebase.

Could you run it for an hour or two please? If nothing comes up then I think merging to OSS-Fuzz could be done to test it out more thoroughly.

Just ran it for two hours now and no issues have come up.

Edit: it now ran into a timeout issue where the processing took 86 seconds. It didn't run into any crashes/memory corruptions though.

Jun 26 '23 12:06 DavidKorczynski

No, it will produce garbage code.

so whats the plan when there is a crash for non-compilable code. how do we tell oss-fuzz that we don't care about that testcase.

Jun 27 '23 11:06 danmar

wild idea; could you run gcc -fsyntax-only or some lightweight syntax checker before passing the code to cppcheck?

Jun 27 '23 11:06 danmar

so whats the plan when there is a crash for non-compilable code. how do we tell oss-fuzz that we don't care about that testcase.

Hmm, I guess there isn't a plan besides marking them as WontFix? I think if the idea is that anything triggered by garbage input is not something to consider then this fuzzer doesn't really make sense. That said, a bug triggerable by garbage input doesn't imply it's not triggerable by syntax-valid input.

wild idea; could you run gcc -fsyntax-only or some lightweight syntax checker before passing the code to cppcheck?

Hmm, let me think about it. In general I think it's a valid idea of a having a syntax checker up as a precondition before passing it to cppchecker. Let me check the options for it.

Jun 27 '23 15:06 DavidKorczynski

That said, a bug triggerable by garbage input doesn't imply it's not triggerable by syntax-valid input.

Yes that is true.

Jun 28 '23 14:06 danmar

I think slightly improving the fuzzing of garbage code is fine.

wild idea; could you run gcc -fsyntax-only or some lightweight syntax checker before passing the code to cppcheck?

That makes total sense and is not wild at all. I think validating the generated data is actually how you guide a fuzzer towards generating actual useful input.

Maybe one of the LLVM fuzzers already provides such inputs which could be leveraged or used as inspiration.

Jul 27 '23 11:07 firewave

That makes total sense and is not wild at all. I think validating the generated data is actually how you guide a fuzzer towards generating actual useful input.

The issues with running gcc -fsyntax-only are:

Launching a process in each fuzz iteration is expensive and will likely slow down the fuzzer a lot.
gcc will not be instrumented with coverage guided instrumentation. As such, it will be a black-box and the coverage-feedback of the fuzzing will not work, which will likely make the fuzzer unable to come up with valid syntax and never call into cppcheck anyways.

For those reasons I was going to look for a lightweight syntax checker as also suggested, but have not made progress there. I think my personal preference would be to use the fuzzer as is and if it reports bugs that are of no interest then remove or disable it.

Jul 27 '23 11:07 DavidKorczynski

I think slightly improving the fuzzing of garbage code is fine.

Problem is that in my experience it's too easy to trigger crashes on garbage code. As far as I know, a few minutes of fuzzing with garbage will trigger a crash. I fear we will end up fixing thousands and thousands of garbage cases.. and see very little real code crashes..

We don't need to run such fuzzing on some infrastructure. It can be executed locally.

I want that cppcheck handles non-standard code therefore it would not be ideal to add a strict syntax check in cppcheck.

Aug 07 '23 12:08 danmar

I believe we can have a ready made script in the repo that I can run if I want to run this fuzzing locally.

So feel free to consider creating some standalone script..

Aug 07 '23 14:08 danmar

I came across a case of incomplete code which triggered a crash which gave me an idea how we could "fuzz" our project in a more "sensible" way compared to actual fuzzing.

We could tokenize valid code and assemble it one by one and pass that to the analysis. Or remove random tokens. That would be way closer to real-life examples then procedural generated code.

I filed https://trac.cppcheck.net/ticket/12357 about it.

Jan 17 '24 11:01 firewave

After finally understanding fuzzing I ended up with a conclusion which matches this in #6119. We do not need an additional fuzzer but simply can apply these changes to the existing one.

As the corpus we should simply use the *.cpp files from the samples folder and not our whole source. That way we have a fixed corpus instead one which changes with each revision.

Mar 13 '24 09:03 firewave