codeql-action icon indicating copy to clipboard operation
codeql-action copied to clipboard

🐛 "SARIF URI scheme \"https\" did not match the checkout URI scheme \"file\",",

Open laurentsimon opened this issue 3 years ago • 19 comments

I'm developing a GitHub action following https://docs.github.com/en/code-security/code-scanning/integrating-with-code-scanning/sarif-support-for-code-scanning#physicallocation-object

The doc says the artifactLocation.uri: If the URI is absolute, code scanning can use the URI to checkout the artifact and match up files in the repository. For example, https://github.com/ghost/example/blob/00/src/promiseUtils.js

When I use a URL though, the analysis keeps failing Analysis processing failed.

I ran curl -u laurentimon:$GITHUB_AUTH_TOKEN -H "Accept: application/vnd.github.v3+json" https://api.github.com/repos/laurentsimon/scorecard-action-test/code-scanning/analyses and got the error message below:

"error": "SARIF URI scheme \"https\" did not match the checkout URI scheme \"file\",",

which seems to indicate https is not supported... but the doc says it is.

So my first question is: are URLs supported? I'm using them in SARIF's relatedLocation.physicalLocation.URI. If they are, what am I doing wrong?

My second question: I'd like to report related locations with urls, e.g. https://api.github.com/repos/ossf/scorecard/releases/assets/41580436, https://github.com/ossf/scorecard/releases/download/v2.1.1/scorecard_2.1.1_checksums.txt.sig. Urls may have different content types, so it's not entirely clear whether I'd need to use StartLine, CharOffset or ByteOffset depending on the content type. Ideally, ByteOffset should work regardless of content type.

Can you advise?

Thanks!

laurentsimon avatar Sep 28 '21 19:09 laurentsimon

Can you attach the relevant portion of you sarif file? Also, if your repository is public, can you share a link?

aeisenberg avatar Sep 28 '21 19:09 aeisenberg

My repo is public https://github.com/laurentsimon/scorecard-action-test File: see the link https://github.com/ossf/scorecard/blob/main/actions/entrypoint.sh I used as example.

{
  "$schema": "https://raw.githubusercontent.com/oasis-tcs/sarif-spec/master/Schemata/sarif-schema-2.1.0.json",
  "version": "2.1.0",
  "runs": [
    {
      "automationDetails": {
        "id": "supply-chain/scorecard/67a8e5f9f6015aace89817878733b539c0f117de-28 Sep 21 18:55 +0000"
      },
      "tool": {
        "driver": {
          "name": "Scorecard",
          "informationUri": "https://github.com/ossf/scorecard",
          "semanticVersion": "2.2.8-4-g67a8e5f-dirty",
          "rules": [
            {
              "id": "b451a10b1a462197cf89aab8af8a0140",
              "name": "Signed-Releases",
              "helpUri": "https://github.com/ossf/scorecard/blob/67a8e5f9f6015aace89817878733b539c0f117de/docs/checks.md#signed-releases",
              "shortDescription": {
                "text": "Determines if the project cryptographically signs release artifacts."
              },
              "fullDescription": {
                "text": "This check tries to determine if the project cryptographically signs release artifacts.<br>Signed releases attest to the provenance of the artifact. A low score is considered 'High' risk.<br>It works by looking for filenames: *.minisig (https://github.com/jedisct1/minisign), *.asc (pgp), *.sign. for the last 5 releases if it's hosted on GitHub. The check does not verify the signatures. It does not currently support other source hosting repositories (forges) other than GitHub."
              },
              "help": {
                "text": "This check tries to determine if the project cryptographically signs release artifacts.\nSigned releases attest to the provenance of the artifact. A low score is considered 'High' risk.\nIt works by looking for filenames: *.minisig (https://github.com/jedisct1/minisign), *.asc (pgp), *.sign. for the last 5 releases if it's hosted on GitHub. The check does not verify the signatures. It does not currently support other source hosting repositories (forges) other than GitHub.",
                "markdown": "This check tries to determine if the project cryptographically signs release artifacts.  Signed releases attest to the provenance of the artifact. A low score is considered 'High' risk.  It works by looking for filenames: *.minisig (https://github.com/jedisct1/minisign), *.asc (pgp), *.sign. for the last 5 releases if it's hosted on GitHub. The check does not verify the signatures. It does not currently support other source hosting repositories (forges) other than GitHub."
              },
              "defaultConfiguration": {
                "level": "error"
              },
              "properties": {
                "precision": "high",
                "tags": [
                  "supply-chain",
                  "security",
                  "releases"
                ]
              }
            }
          ]
        }
      },
      "results": [
        {
          "ruleId": "b451a10b1a462197cf89aab8af8a0140",
          "level": "note",
          "ruleIndex": 0,
          "message": {
            "text": "5 out of 5 artifacts are signed -- score normalized to 10"
          },
          "locations": [
            {
              "physicalLocation": {
                "region": {
                  "startLine": 1
                },
                "artifactLocation": {
                  "uri": ".github/scorecard.yml",
                  "uriBaseId": "%SRCROOT%"
                }
              }
            }
          ],
          "relatedLocations": [
            {
              "id": 0,
              "physicalLocation": {
                "region": {
                  "startLine": 1
                },
                "artifactLocation": {
                  "uri": "https://github.com/ossf/scorecard/blob/main/actions/entrypoint.sh"
                }
              },
              "message": {
                "text": "signed release artifact: scorecard_2.2.8_checksums.txt.sig"
              }
            },
            {
              "id": 1,
              "physicalLocation": {
                "region": {
                  "startLine": 1
                },
                "artifactLocation": {
                  "uri": "https://github.com/ossf/scorecard/blob/main/actions/entrypoint.sh"
                }
              },
              "message": {
                "text": "signed release artifact: scorecard_2.2.3_checksums.txt.sig"
              }
            },
            {
              "id": 2,
              "physicalLocation": {
                "region": {
                  "startLine": 1
                },
                "artifactLocation": {
                  "uri": "https://github.com/ossf/scorecard/blob/main/actions/entrypoint.sh"
                }
              },
              "message": {
                "text": "signed release artifact: scorecard_2.1.3_checksums.txt.sig"
              }
            },
            {
              "id": 3,
              "physicalLocation": {
                "region": {
                  "startLine": 1
                },
                "artifactLocation": {
                  "uri": "https://github.com/ossf/scorecard/blob/main/actions/entrypoint.sh"
                }
              },
              "message": {
                "text": "signed release artifact: scorecard_2.1.2_checksums.txt.sig"
              }
            },
            {
              "id": 4,
              "physicalLocation": {
                "region": {
                  "startLine": 1
                },
                "artifactLocation": {
                  "uri": "https://github.com/ossf/scorecard/blob/main/actions/entrypoint.sh"
                }
              },
              "message": {
                "text": "signed release artifact: scorecard_2.1.1_checksums.txt.sig"
              }
            }
          ]
        }
      ]
    }
  ]
}

laurentsimon avatar Sep 28 '21 19:09 laurentsimon

The sarif file looks fine to me. I'm able to upload and download it from code scanning. Can you point me to a failing actions run? Presumably, you have an instance where upload-sarif is not working?

aeisenberg avatar Sep 28 '21 20:09 aeisenberg

And code-scanning currently only supports character offset/length and start/end line/columns offsets.

aeisenberg avatar Sep 28 '21 20:09 aeisenberg

re: support for only character offset.. What if a binary file is reported in the results? Shall I set the line/character offset or that won't work?

re: upload-sarif This is the workflow used to test https://github.com/laurentsimon/scorecard-action-test/blob/main/.github/workflows/scorecard-analysis.yml

It works well except when it encounters a URL.

laurentsimon avatar Sep 28 '21 21:09 laurentsimon

I've forwarded your question on to someone on the team who has more experience with how code scanning processes sarif files.

aeisenberg avatar Sep 28 '21 21:09 aeisenberg

Thanks so much!

laurentsimon avatar Sep 28 '21 22:09 laurentsimon

And code-scanning currently only supports character offset/length and start/end line/columns offsets.

I just tested using a ByteOffset with a text file and it worked. When using a true non-ASCII binary, it did report an error but said Preview unavailable - Sorry, we couldn't find this file in the repository. This is actually good enough.

Looking forward to your guidance nevertheless.

One thing that would benefit UX wise is reporting all locations in the dashboard, instead of only the first one.

laurentsimon avatar Sep 28 '21 22:09 laurentsimon

:wave:

Code Scanning assumes that the locations are for files in the repository. As a result it is trying to relativize your URLs against a file:// URL that is the directory on which the repository is checked out on the Actions runner.

Using a relative URL, as you have done in the locations, is the easiest way to work with Code Scanning. Would this work for the relatedLocations as well?

One thing that would benefit UX wise is reporting all locations in the dashboard, instead of only the first one.

Are you talking about the relatedLocations? Can you help me to understand what the relatedLocations represent in your output file? I'll then be able to make some recommendations or take feedback to the product team.

I just tested using a ByteOffset with a text file and it worked. When using a true non-ASCII binary, it did report an error but said Preview unavailable - Sorry, we couldn't find this file in the repository. This is actually good enough.

Right now we just support Start/EndLine and Start/EndColumn because Code Scanning is designed to work with source code. The result view highlights a particular line range in a text file and we haven't thought about how we would display or highlight a range in a binary file. For now I think your best bet is to leave out the line/column information.

rneatherway avatar Sep 29 '21 16:09 rneatherway

👋

Code Scanning assumes that the locations are for files in the repository. As a result it is trying to relativize your URLs against a file:// URL that is the directory on which the repository is checked out on the Actions runner.

Using a relative URL, as you have done in the locations, is the easiest way to work with Code Scanning. Would this work for the relatedLocations as well?

not really. Our tool outputs information about assets and release, so links like https://api.github.com/repos/ossf/scorecard/releases/assets/41580436, https://github.com/ossf/scorecard/releases/download/v2.1.1/scorecard_2.1.1_checksums.txt.sig are the one we wanted to insert for related locations. It would be good enough for a clickable link to show. Those links are useful for users to download and inspect if they wish. My suggestion would be that for links that do not correspond to a file in the repo, you can simply show a clickable link.

One thing that would benefit UX wise is reporting all locations in the dashboard, instead of only the first one.

Are you talking about the relatedLocations? Can you help me to understand what the relatedLocations represent in your output file? I'll then be able to make some recommendations or take feedback to the product team.

No, I was really talking about locations: they are an array in the standard. These tutorials https://github.com/microsoft/sarif-tutorials/blob/main/docs/2-Basics.md#locations says is an array because sometimes you have to make changes in more than one place to fix a problem

I just tested using a ByteOffset with a text file and it worked. When using a true non-ASCII binary, it did report an error but said Preview unavailable - Sorry, we couldn't find this file in the repository. This is actually good enough.

Right now we just support Start/EndLine and Start/EndColumn because Code Scanning is designed to work with source code. The result view highlights a particular line range in a text file and we haven't thought about how we would display or highlight a range in a binary file. For now I think your best bet is to leave out the line/column information.

laurentsimon avatar Sep 30 '21 16:09 laurentsimon

not really. Our tool outputs information about assets and release, so links like https://api.github.com/repos/ossf/scorecard/releases/assets/41580436, https://github.com/ossf/scorecard/releases/download/v2.1.1/scorecard_2.1.1_checksums.txt.sig are the one we wanted to insert for related locations. It would be good enough for a clickable link to show. Those links are useful for users to download and inspect if they wish. My suggestion would be that for links that do not correspond to a file in the repo, you can simply show a clickable link.

Thanks, that makes sense. I'll open an issue to consider how we handle URLs internally.

In SARIF, relatedLocations are normally used by including a markdown-style link of the form [link text](N) where N is the index of the relatedLocation you wish to reference. You can see an example here: https://docs.oasis-open.org/sarif/sarif/v2.1.0/os/sarif-v2.1.0-os.html#_Ref493499246

Because of the problem with accepting https URLs you won't currently be able to do this, but you could short-cut the process and directly insert a markdown link into the result message.

No, I was really talking about locations: they are an array in the standard. These tutorials https://github.com/microsoft/sarif-tutorials/blob/main/docs/2-Basics.md#locations says is an array because sometimes you have to make changes in more than one place to fix a problem

Yes, this is a current limitation of our SARIF support, which would be hard to change because of the way the alert view works. Our current suggestion would be either to create multiple results or to have a single result and use relatedLocations to reference the other places.

rneatherway avatar Oct 01 '21 13:10 rneatherway

Please do keep us up to date with your development! We would be happy to review and offer further suggestions and guidance.

rneatherway avatar Oct 01 '21 13:10 rneatherway

Thank you so much for the prompt help. Kudos to your team! I'll make sure to create new issues when I encounter problems. Thanks!

laurentsimon avatar Oct 04 '21 18:10 laurentsimon

You're welcome! :grin:

I'll close this out now, but as I said, very happy to discuss other points in the future. I'll also aim to update you here if we change the behaviour of our absolute URL handling.

rneatherway avatar Oct 11 '21 12:10 rneatherway

@rneatherway any updates on this topic? Having the same issue with http. In blackbox scanning, like a ZAP scan, a file location isn't useful. Any suggestions for workarounds?

elgohr avatar Oct 09 '23 12:10 elgohr

When this feature was originally suggested, we did not have enough resources to implement it. I am asking again to see if it is something worth implementing now.

aeisenberg avatar Oct 09 '23 15:10 aeisenberg

Unfortunately, we do not have any capacity to work on this right now. It's on our backlog and we may decide to pick it up in the future.

aeisenberg avatar Oct 10 '23 16:10 aeisenberg

Could we keep this issue open until implemented?

elgohr avatar Oct 10 '23 17:10 elgohr

Also running into this issue - I'm wondering if it's possible to ignore and continue for https URIs in the short-term until the team settles on an appropriate way to handle them.

e.g. if I develop a workflow that has relative file:// uris for some artifacts & https:// uris for others, to produce a valid SARIF file (per spec) it would be great if the API would accept my valid SARIF file & just handle the file:// uris within it as normal.

At present, we're looking at pre-processing the SARIF to exclude the https:// artifacts (leaving us with an incomplete SARIF), just to satisfy Github's validation - ideally pre-processing wouldn't be required on our side.

NOTE: Another avenue we explored was omitting location selectively for these artifacts, but that yielded the following:

locationFromSarifResult: expected at least one location

So this leaves no option but to completely omit results from the SARIF for which the artifact can't be resolved locally by Github's API. That's not really tenable.

edwardgalligan avatar May 29 '24 13:05 edwardgalligan