copacetic icon indicating copy to clipboard operation
copacetic copied to clipboard

[REQ] support for sarif as input

Open sozercan opened this issue 2 years ago • 19 comments

What kind of request is this?

New feature

What is your request or suggestion?

Today, copa supports trivy json as the input. We should also support sarif as input for generic results.

Scanners like trivy and grype supports exporting as sarif. https://aquasecurity.github.io/trivy/v0.37/docs/vulnerability/examples/report/#sarif

sozercan avatar Feb 07 '23 23:02 sozercan

Hi @salaxander @sozercan is this being actively worked on ? or can I work on it?

reetasingh avatar Feb 12 '23 08:02 reetasingh

@reetasingh sure. let's start with a design doc on how this would work especially if it can work across multiple scanners. feel free to use google docs or whatever you can publicly share and we can review the design before any implementation

sozercan avatar Feb 13 '23 19:02 sozercan

Hi @sozercan sure, will get started with design doc

reetasingh avatar Feb 19 '23 05:02 reetasingh

Hi @sozercan sorry for late update on this. I have been actively investigating this issue. I noticed that the sarif output for trivy does not have information about OS Family, OS Version and OS Architecture https://gist.github.com/reetasingh/85f890a47571dd3a908124c292c5e001

Also verified in the code that information is not being passed https://github.com/aquasecurity/trivy/blob/main/pkg/report/sarif.go

If I understand it correctly, this information would be needed to decide the package manager and hence its a blocker to implement this feature. Let me know if my understanding is correct

reetasingh avatar Mar 01 '23 07:03 reetasingh

/assign

reetasingh avatar Mar 01 '23 07:03 reetasingh

@reetasingh thanks for the investigation! I think we would need that information and package names/versions so this might be a blocker for adapting sarif as input in the short term.

sozercan avatar Mar 07 '23 22:03 sozercan

Thanks @sozercan , here is the high level design doc https://docs.google.com/document/d/1ZtuJDxDY_DrWTtQNE8ME0-WkkcQUh9x-IM9LSxoBnsw/edit#heading=h.lgtokumb4lep I have mentioned absence of information about OS Family, OS Version and OS Architecture as a blocker

Let me know your opinions on this doc

reetasingh avatar Mar 11 '23 19:03 reetasingh

thanks @reetasingh for the investigation and the doc! 🙏

Copying the blocker from your doc here:

The biggest blocker is that the Sarif output report generated by Trivy tool does not have information about OS Family, OS Version and OS Architecture https://gist.github.com/reetasingh/85f890a47571dd3a908124c292c5e001
Also verified in the code that information is not being passed https://github.com/aquasecurity/trivy/blob/main/pkg/report/sarif.go

This information is a must to have for later processing in the copacetic tool which needs this information to decide the package manager to use to patch the image. 

We can create an issue on the [Trivy github repo](https://github.com/aquasecurity/trivy) and if required implement PR to add this info. 
This can be a little tricky because the Sarif Report format does not have a dedicated field for storing info about OS Family, OS Version and OS Architecture  so this needs to be shoved in one of the help text fields.

sozercan avatar May 16 '23 18:05 sozercan

What about the SARIF output from Grype? Could that be made to work today?

craigbox avatar May 22 '23 15:05 craigbox

@craigbox same applies to grype too unfortunately. example: https://snips.sh/f/2C3_JBXLeC?r=1

sozercan avatar May 22 '23 16:05 sozercan

@toddysm FYI

ritazh avatar Jul 07 '23 00:07 ritazh

Hello! I'm co-editor of the SARIF standard and someone brought this thread to my attention. I just need a bit more context, if someone can provide it. If I think about OS details, these could be relevant to many things covered in SARIF including: 1) the machine environment for a tool invocation, 2) a distributed agent environment where a scan occcurred, 3) metadata for a literal scan target, e.g., a VM that's getting analyzed, 4), metadata for rules, e.g., a set of OS version that are relevant for a specific check).

My background is static analysis at the binary/code level, so your additional context will be helpful. I'll go do some due diligence around trivy and your project to help get oriented. I wanted to reply here immediate so you know I'm trying to jump in.

SARIF has some existing properties, some of which could be relevant/useful. If there are gaps, SARIF has clear mechanisms for packaging arbitrary supporting data, as property bags attached to other data. If you are an aggregating system, you could help define conventions for carrying/storing this missing data across multiple tools that produce conceptually similarly things. Finally, the SARIF 2.2 design effort just kicked off; moving forward we can request non-breaking changes to the format, if helpful, that address your needs. Here's where those requests would get filed: https://github.com/oasis-tcs/sarif-spec/

michaelcfanning avatar Jul 20 '23 19:07 michaelcfanning

@anubhav06 is working on Kubescape for the LFX Mentorship program, and I've asked him to look at integrating Kubescape as a source of vulnerability data for copa.

Kubescape uses Grype as its engine, but we can mark the data up as required to make this possible. I've recommended that he attempt to use SARIF for the interchange format. We'll open other issues as required to discuss the integration but I wanted to flag this here.

craigbox avatar Jul 23 '23 08:07 craigbox

@camaleon2016 FYI

ritazh avatar Aug 04 '23 23:08 ritazh

Hey @sozercan Let's say that we are able to add the missing OS information somewhere in the SARIF output for the different scanners (say, in the help text field). So according to the design document which @reetasingh has proposed, copa currently has two report parsers (trivy and qualys) and we can expand these individual parsers to support multiple input formats (SARIF, in this case).

However, won't this mean that for every new scanner apart from what copa currently supports (like Grype, Kubescape, etc.), we will have to write a new parser file for it? In #59 you say that:

The intention is for others to contribute without each scanner being part of the core copa.

But, here in the case of SARIF support as well, all the other scanners will still be part of the core copa, no?

Just trying to understand this clearly. Please correct me if I'm wrong.

anubhav06 avatar Aug 05 '23 16:08 anubhav06

I tried working on modifying grype's SARIF output format to accommodate the missing OS Name, Version and architecture, in the help text field. This is how it looks now: https://gist.github.com/anubhav06/6fc44f31dc026717cfce1836fc022294 What do you all think?

anubhav06 avatar Aug 08 '23 11:08 anubhav06

@anubhav06 in the ideal world, copa would just parse SARIF files for vuln results without the need for logic for each individual scanners.

However, as you and @reetasingh pointed out, SARIF is missing the os information. In addition to this, each scanner's SARIF outputs are different in terms of parsing package information needed for copa (package name, current version, fixed version). This means that even if we align on SARIF, copa would have to parse each scanner’s output seperately even though they are all SARIF.

here are 3 examples of CVE-2022-32221 of nginx:1.22.0 image scan results in SARIF:

Grype: https://snips.sh/f/mUtsOWhdbz

{
          "ruleId": "CVE-2022-32221-libcurl4",
          "message": {
            "text": "The path /usr/share/doc/libcurl4/copyright reports libcurl4 at version 7.74.0-1.3+deb11u3  which is a vulnerable (deb) package installed in the container"
          },
          "locations": [
            {
              "physicalLocation": {
                "artifactLocation": {
                  "uri": "image//usr/share/doc/libcurl4/copyright"
                },
                "region": {
                  "startLine": 1,
                  "startColumn": 1,
                  "endLine": 1,
                  "endColumn": 1
                }
              },
              "logicalLocations": [
                {
                  "name": "/usr/share/doc/libcurl4/copyright",
                  "fullyQualifiedName": "nginx:1.22.0@sha256:b9361c275c5d9b3d5a8c6cff59733193fec1be8cb57374d880e8dd8fd503ed8b:/usr/share/doc/libcurl4/copyright"
                },
                {
                  "name": "/var/lib/dpkg/info/libcurl4:amd64.md5sums",
                  "fullyQualifiedName": "nginx:1.22.0@sha256:b9361c275c5d9b3d5a8c6cff59733193fec1be8cb57374d880e8dd8fd503ed8b:/var/lib/dpkg/info/libcurl4:amd64.md5sums"
                },
                {
                  "name": "/var/lib/dpkg/status",
                  "fullyQualifiedName": "nginx:1.22.0@sha256:b9361c275c5d9b3d5a8c6cff59733193fec1be8cb57374d880e8dd8fd503ed8b:/var/lib/dpkg/status"
                }
              ]
            }
          ]
        },

Snyk: https://snips.sh/f/nvdkbzGTz6

Snyk doesn’t have CVE-2022-32221 in results for some reason, so this is from rules

{
              "id": "SNYK-DEBIAN11-CURL-3065656",
              "shortDescription": {
                "text": "Critical severity - Exposure of Resource to Wrong Sphere vulnerability in curl"
              },
              "fullDescription": {
                "text": "(CVE-2022-32221) curl/[email protected]+deb11u3"
              },
              "help": {
                "text": "",
                "markdown": "## NVD Description\n**_Note:_** _Versions mentioned in the description apply only to the upstream `curl` package and not the `curl` package as distributed by `Debian:11`._\n_See `How to fix?` for `Debian:11` relevant fixed versions and status._\n\nWhen doing HTTP(S) transfers, libcurl might erroneously use the read callback (`CURLOPT_READFUNCTION`) to ask for data to send, even when the `CURLOPT_POSTFIELDS` option has been set, if the same handle previously was used to issue a `PUT` request which used that callback. This flaw may surprise the application and cause it to misbehave and either send off the wrong data or use memory after free or similar in the subsequent `POST` request. The problem exists in the logic for a reused handle when it is changed from a PUT to a POST.\n## Remediation\nUpgrade `Debian:11` `curl` to version 7.74.0-1.3+deb11u5 or higher.\n## References\n- [ADVISORY](https://security-tracker.debian.org/tracker/CVE-2022-32221)\n- [[email protected]](https://hackerone.com/reports/1704017)\n- [[email protected]](https://security.gentoo.org/glsa/202212-01)\n- [[email protected]](https://security.netapp.com/advisory/ntap-20230110-0006/)\n- [[email protected]](https://support.apple.com/kb/HT213604)\n- [[email protected]](https://support.apple.com/kb/HT213605)\n- [[email protected]](http://seclists.org/fulldisclosure/2023/Jan/20)\n- [[email protected]](http://seclists.org/fulldisclosure/2023/Jan/19)\n- [[email protected]](https://www.debian.org/security/2023/dsa-5330)\n- [[email protected]](https://lists.debian.org/debian-lts-announce/2023/01/msg00028.html)\n- [[email protected]](https://security.netapp.com/advisory/ntap-20230208-0002/)\n- [[email protected]](http://www.openwall.com/lists/oss-security/2023/05/17/4)\n"
              },
              "defaultConfiguration": {
                "level": "error"
              },
              "properties": {
                "tags": [
                  "security",
                  "CWE-668",
                  "deb"
                ]
              }
            },

Trivy: https://snips.sh/f/fNq01xYjhm

{
          "ruleId": "CVE-2022-32221",
          "ruleIndex": 5,
          "level": "error",
          "message": {
            "text": "Package: libcurl4\nInstalled Version: 7.74.0-1.3+deb11u3\nVulnerability CVE-2022-32221\nSeverity: CRITICAL\nFixed Version: 7.74.0-1.3+deb11u5\nLink: [CVE-2022-32221](https://avd.aquasec.com/nvd/cve-2022-32221)"
          },
          "locations": [
            {
              "physicalLocation": {
                "artifactLocation": {
                  "uri": "library/nginx",
                  "uriBaseId": "ROOTPATH"
                },
                "region": {
                  "startLine": 1,
                  "startColumn": 1,
                  "endLine": 1,
                  "endColumn": 1
                }
              },
              "message": {
                "text": "library/nginx: [email protected]+deb11u3"
              }
            }
          ]
        },

Because of these limitations of SARIF, we need individual scanners to provide the results in an accurate way so copa can parse. However, we would like it so that anyone can integrate their own scanner without them being part of core copa. This is where modular scanners (#59) comes in so each scanner can be out of tree. @anubhav06 let me know if this answers your questions.

sozercan avatar Aug 08 '23 19:08 sozercan

Just for comparison, this is what Trivy's own JSON output provides today: https://snips.sh/f/BSl2OG1udf

...
  "Metadata": {
    "OS": {
      "Family": "debian",
    },
...
  "Results": [
    {
          "PkgName": "curl",
          "InstalledVersion": "7.74.0-1.3+deb11u3",
          "FixedVersion": "7.74.0-1.3+deb11u5",
          ...

sozercan avatar Aug 08 '23 19:08 sozercan

Yes, this answered my question. So basically since each scanner produces a different SARIF output, and we don't want each new scanner to be part of core copa , we'll be going with the modular scanner approach. There, the different scanners can be made to be compatible with copa, without actually being part of core copa.

Thanks for the clarification :)

anubhav06 avatar Aug 08 '23 20:08 anubhav06