cve-bin-tool icon indicating copy to clipboard operation
cve-bin-tool copied to clipboard

Implemeting fuzzing in the project

Open yashugarg opened this issue 3 years ago • 10 comments

This thread this just to discuss and figure out how we are to move forward while implementing fuzzing in the project.

yashugarg avatar Aug 02 '22 18:08 yashugarg

While going through the atheris code @terriko implemented for the ci in the project, I understand that the basic setup for each module is rather similar. I am proposing a basic setup class that can have specific hooks added while fuzzing certain methods.

yashugarg avatar Aug 02 '22 18:08 yashugarg

I think the most impactful thing we can do will be switching to more structure-aware fuzzing for the following formats:

  • csv (as cve-bin-tool expects it, so with some specific headers)
  • json (as cve-bin-tool expects it for merged reports)
  • SPDX SBOM
  • CycloneDX SBOM

Then hook those directly up to the parsing functions we use rather than going through main()

Once we make it through those, we might want to focus on the formats we use for vulnerability data:

  • NVD
  • OSV
  • any extra data like the stuff for SQLITE

If we keep going with Atheris, it's using libfuzzer under the hood which used "protocol buffers" to do this: https://github.com/google/fuzzing/blob/master/docs/structure-aware-fuzzing.md#protocol-buffers-as-intermediate-format

Those examples aren't in python but there's some more python docs here: https://developers.google.com/protocol-buffers/docs/pythontutorial

And I also had this one earmarked as maybe helpful: https://faun.pub/using-googles-protocol-buffers-in-python-basics-ac79e9a6e6a9

terriko avatar Aug 02 '22 23:08 terriko

Sounds good I've been playing around with atheris and looking up projects where it's already implemented. Even though I didn't find anything extraordinary, it's enough get us started :) I'll start by creating a json inputs!

yashugarg avatar Aug 02 '22 23:08 yashugarg

Those are my top concerns but we'll eventually want to fuzz anything we parse, so anything we can accept with --input or --sbom is where I'd start but there's also opportunities for any of the language parsers, and maybe even any file an extractor can handle (we mostly send those to external utilities, but if they fail epically we still need to handle things correctly. Even in the worst case where we find something in an external parser, that's just an opportunity for you to get your name on a CVE against another product!)

terriko avatar Aug 02 '22 23:08 terriko

@yashugarg I started trying to get a setup for using protobufs in https://github.com/terriko/cve-bin-tool/tree/fuzz_protobuff/fuzz

I've run into dependency issues getting atheris_libprotobuf_mutator installed and I've run out of time for today, so this may be waaaay off base still, but that's kind of what I had in mind if that helps!

terriko avatar Aug 05 '22 00:08 terriko

@terriko

Installing libprotobuf-mutator for Atheris from source requires bazel. Visit https://docs.bazel.build/versions/master/install.html for installation instructions.

Source: https://pypi.org/project/atheris-libprotobuf-mutator/

I followed the steps and atheris_libprotobuf_mutator works now!

yashugarg avatar Aug 06 '22 08:08 yashugarg

#1873 protobuf mutator is working now, please check the code and verify if this is what you had in mind @terriko

yashugarg avatar Aug 06 '22 23:08 yashugarg

Yeah, I got as far as starting to get Bazel installed but had some trouble installing it on the cloud VM I was using (almost certainly another missing dependency) and ran out of time before I had another commitment on my calendar. I'll spin another VM up this week and see if I can get it to behave.

But yes, that looks like what I had in mind for a first pass! Can you clean up the linting errors so we can get #1873 merged? Did you manage to find anything with it?

terriko avatar Aug 08 '22 18:08 terriko

Could we exempt the proto file in linting tests, because it fails on the generated python file. Also I didn't really get any crashes from this in over 200,000 iterations, I'll try fuzzing their parent functions that parse csv and json files.

yashugarg avatar Aug 08 '22 18:08 yashugarg

Yeah, go ahead and skip the linting for the proto file if it's a problem. It's probably a setting in our pre-commit config to do that?

terriko avatar Aug 08 '22 19:08 terriko

I think we can safely close this one. Thanks @yashugarg for working on this over GSoC 2022!

terriko avatar Oct 25 '22 22:10 terriko