scorecard ✨ Support more SAST tools

This PR addresses https://github.com/ossf/scorecard/issues/1420, where we came to the conclusion that we want more granularity in the SAST check. This PR does not contain unit tests yet, because there's a lot to discuss and I want to see if we're all OK with this changes.

WARNING: please ignore the comments and the calls to Println(): I will clean up once we have agreed everything else is fine.

This PR does look for:

Linter tools run on each PR, and reward 3 points if it's the case
Code analysis tools run on each PR, and reward 4 points if its the case
Supply-chain analysis tools run on each PR, and reward 3 points if it's the case. Since scorecard is the only supply-chain tool at the moment and we don't advertise PR support, we give 3 points if scorecard is enabled, even if not run on PRs.

This PR gives no additional points if a tool is declared in a workflow but not run on PRs - with the exception of scorecard above.

closes https://github.com/ossf/scorecard/issues/1380 closes https://github.com/ossf/scorecard/issues/1420 closes https://github.com/ossf/scorecard/issues/1580

Jan 18 '22 23:01 laurentsimon

Great, But we need tests!

so much to discuss before I add the test. I'll add them at the end.

Jan 18 '22 23:01 laurentsimon

cc @evverx

Jan 18 '22 23:01 laurentsimon

Linter tools run on each PR, and reward 3 points if it's the case Supply-chain analysis tools run on each PR, and reward 3 points if it's the case

I don't think that linters and supply-chain tools (combined) should outweigh SAST tools like LGTM, CodeQL or Coverity so it seems to me that they should be downgraded to 1 or something like that. I mean, systemd for example uses superlinter and it finds issues from time to time but I'm not sure it would be fair to say that in terms of security it's almost as important as LGTM or Coverity. It wouldn't be fair if projects using only LGTM received 4 while projects using only superlinter received 3 either.

I'll try to take a closer look later. Thanks!

Jan 19 '22 01:01 evverx

This PR gives no additional points if a tool is declared in a workflow but not run on PRs

It's not always feasible to run SAST tools on every PR but if they are run daily or weekly it's still useful and should be rewarded I think. For example it takes CodeQL about 40 minutes to analyze systemd so it doesn't make much sense to clog its GHActions pool on every PR but it's still run daily

Jan 19 '22 01:01 evverx

systemd runs LGTM and superlinter on every PR, CodeQL daily and Coverity Scan daily and with this PR applied its score went from 10 to 6. I'm not sure it's what it should get :-)

Jan 19 '22 02:01 evverx

FWIW The curl score went from 7 to 0 with this PR applied.

Anyway, on the whole I think I like the idea of putting various tools into separate groups and assigning different "scores" to them. Though the way those "scores" are assigned depending on the group, frequency and so on needs tweaking I think. Thanks!

Jan 19 '22 03:01 evverx

Will take a closer look at this tomo. Overall LGTM.

Jan 19 '22 03:01 azeemshaikh38

Thanks for the feedback @evverx I totally agree we need tweak the score; and also that tools that run on commits/schedules should be rewarded. Maybe give 70% of the points if run on commits vs PRs. Feel free to suggest.

Jan 19 '22 04:01 laurentsimon

systemd runs LGTM and superlinter on every PR, CodeQL daily and Coverity Scan daily and with this PR applied its score went from 10 to 6. I'm not sure it's what it should get :-)

you'd actually get a 10 if you install scorecard as an action. It suffices for one linter and one static code analysis tool (LGTM in your case) to be run on all PRs + scorecard installed to get 10 - we don't advertise scorecard on PR because it requires a (read) PAT to be made available on PRs atm. That said, I don't disagree that we need to reward tools run on commit or on schedules.

Jan 19 '22 04:01 laurentsimon

Maybe give 70% of the points if run on commits vs PRs

I think it depends on how frequent those runs are. If projects run SAST tools daily (or a few times a day) I think they should be scored higher than projects with weekly or monthly runs.

you'd actually get a 10 if you install scorecard as an action.

I get that :-) but it's complicated because on the one hand I think scorecard is helpful but on the other hand I don't want to promote OSSF. Other than that looking at a couple of checks that have already been introduced and issues where new checks are outlined it seems to me that scorecard is moving towards "consumers" instead of "producers" (which is understandable) but I'm not sure it will make sense for "producers" to keep it upstream at some point (time will tell I think). Just to clarify, I have absolutely no problems with this direction but somehow tools catered to "consumers" somehow shift things that should be done by "consumers" to "producers" (the latest example would be https://github.com/google/oss-fuzz/issues/7146) and generally I wouldn't want to annoy maintainers.

Jan 19 '22 04:01 evverx

LGTM. 2 open issues would be - unit tests and addressing @evverx's comment.

Jan 20 '22 16:01 azeemshaikh38

Stale pull request message

Jan 31 '22 02:01 github-actions[bot]

update: I've not tweaked the logic for score computation as follows:

linter are run on all PRs. Linters are cheap so this seems acceptable. 1 points is awarded if it's the case
supply-chain tool are used. We don't check if it's run on all PRs. So long as it's defined in a workflow and/or run on at least one commit, we award 2 points for it
static analysis. Same as for supply-chain, and we award 5 points. If one tool is run on all PRs, we give an additional 2 points.

Wdut?

Feb 03 '22 03:02 laurentsimon

supply-chain tool are used. We don't check if it's run on all PRs. So long as it's defined in a workflow and/or run on at least one commit, we award 2 points for it

I think supply chain tools should follow the same rules as static analyzers in the sense that if they can't be run on PRs they should be downgraded to 1. It should help to prompt the developers of those tools to make them compatible with the PR workflow (which is supposed to catch issues as early as possible).

Feb 03 '22 03:02 evverx

supply-chain tool are used. We don't check if it's run on all PRs. So long as it's defined in a workflow and/or run on at least one commit, we award 2 points for it

I think supply chain tools should follow the same rules as static analyzers in the sense that if they can't be run on PRs they should be downgraded to 1. It should help to prompt the developers of those tools to make them compatible with the PR workflow (which is supposed to catch issues as early as possible).

good point. I'll do that then. Thank you for your feedback!

Feb 03 '22 16:02 laurentsimon

thoughts on rate limiting

https://docs.github.com/en/rest/overview/resources-in-the-rest-api#requests-from-github-actions

When using GITHUB_TOKEN, the rate limit is 1,000 requests per hour per repository. For requests to resources that belong to an enterprise account on GitHub.com, GitHub Enterprise Cloud's rate limit applies, and the limit is 15,000 requests per hour per repository.

That's fine since each commit should have a different token.

For PAT, https://docs.github.com/en/developers/apps/building-github-apps/rate-limits-for-github-apps

GitHub Apps making server-to-server requests use the installation's minimum rate limit of 5,000 requests per hour. If an application is installed on an organization with more than 20 users, the application receives another 50 requests per hour for each user. Installations that have more than 20 repositories receive another 50 requests per hour for each repository. The maximum rate limit for an installation is 12,500 requests per hour.

For a repository like systemd/systemd, we'd need 30*6 = ~200 API requests, so let's say 300 for each push. 10-15 commits per hours should be fine.

FYI, we hope to be able to support GITHUB_TOKEN in the future. For cron job, highly unlikely we'll ever be able to run SAST check if we apply the changes proposed here.

@azeemshaikh38 any further thought?

Feb 03 '22 17:02 laurentsimon

thoughts on rate limiting

https://docs.github.com/en/rest/overview/resources-in-the-rest-api#requests-from-github-actions

When using GITHUB_TOKEN, the rate limit is 1,000 requests per hour per repository. For requests to resources that belong to an enterprise account on GitHub.com, GitHub Enterprise Cloud's rate limit applies, and the limit is 15,000 requests per hour per repository.

That's fine since each commit should have a different token.

For PAT, https://docs.github.com/en/developers/apps/building-github-apps/rate-limits-for-github-apps

GitHub Apps making server-to-server requests use the installation's minimum rate limit of 5,000 requests per hour. If an application is installed on an organization with more than 20 users, the application receives another 50 requests per hour for each user. Installations that have more than 20 repositories receive another 50 requests per hour for each repository. The maximum rate limit for an installation is 12,500 requests per hour.

For a repository like systemd/systemd, we'd need 30*6 = ~200 API requests, so let's say 300 for each push. 10-15 commits per hours should be fine.

FYI, we hope to be able to support GITHUB_TOKEN in the future. For cron job, highly unlikely we'll ever be able to run SAST check if we apply the changes proposed here.

@azeemshaikh38 any further thought?

Feb 03 '22 17:02 laurentsimon

For cron job, highly unlikely we'll ever be able to run SAST check if we apply the changes proposed here.

FWIW Looking at https://metrics.openssf.org/grafana/d/default/metric-dashboard?orgId=1&var-PackageURL=pkg%3Agithub/systemd/systemd it seems it doesn't work even without this PR (assuming the cron job is responsible for updating that dashboard)

Feb 03 '22 19:02 evverx

For cron job, highly unlikely we'll ever be able to run SAST check if we apply the changes proposed here.

FWIW Looking at https://metrics.openssf.org/grafana/d/default/metric-dashboard?orgId=1&var-PackageURL=pkg%3Agithub/systemd/systemd it seems it doesn't work even without this PR (assuming the cron job is responsible for updating that dashboard)

you're correct we currently don't run it. Fyi, the data in the dashboard is out of date. cc @scovetta @inferno-chromium

Feb 03 '22 19:02 laurentsimon

FWIW @laurentsimon with https://github.com/systemd/systemd/commit/74b781de2545c80b988981aace9e04148cf3dfe3 merged it seems scorecard can no longer detect super-linter. As far as I understand the check isn't aware of the slim image: https://github.com/github/super-linter#slim-image

Feb 13 '22 21:02 evverx

Though looking at

    {
      "details": [
        "Info: code-analysis tool 'CodeQL' detected in workflow: .github/workflows/codeql-analysis.yml",
        "Warn: linter tool run on 0 commits out of 15 commits",
        "Warn: supply-chain tool run on 0 commits out of 15 commits",
        "Info: code-analysis tool run on the last 15 commits"
      ],
      "reason": "SAST category of tools are not used%!(EXTRA string=code-analysis, string=supply-chain)",
      "name": "SAST",

it seems either the commit based analysis doesn't work as expected or it just doesn't look at PRs there. Anyway I'm not sure why it stopped working.

Feb 13 '22 22:02 evverx

Though looking at

    {
      "details": [
        "Info: code-analysis tool 'CodeQL' detected in workflow: .github/workflows/codeql-analysis.yml",
        "Warn: linter tool run on 0 commits out of 15 commits",
        "Warn: supply-chain tool run on 0 commits out of 15 commits",
        "Info: code-analysis tool run on the last 15 commits"
      ],
      "reason": "SAST category of tools are not used%!(EXTRA string=code-analysis, string=supply-chain)",
      "name": "SAST",

it seems either the commit based analysis doesn't work as expected or it just doesn't look at PRs there. Anyway I'm not sure why it stopped working.

This might be a similar issue to https://github.com/ossf/scorecard/issues/1580#issuecomment-1027946049. We use GitHub Search APIs for detecting SAST tools and GitHub's Search APIs likely unreliable. I tried running Scorecard locally on systemd and the results varied. We should probably consider updating the Search API in RepoClient to use locally available code instead. @laurentsimon fyi.

Feb 14 '22 16:02 azeemshaikh38

Thanks for the info @evverx Sorry, this PR has been on the back burner for a while :/ re: search API in repoClient @azeemshaikh38 do we need this API at all or can we just let checks use the file listing API https://github.com/ossf/scorecard/blob/main/checks/fileparser/listing.go?

Feb 14 '22 17:02 laurentsimon

We use GitHub Search APIs for detecting SAST tools and GitHub's Search APIs likely unreliable.

Could it be that because of that the "code review" check is kind of broken as well?

  "repo": {
    "name": "github.com/systemd/systemd",
    "commit": "d74da762a3b1edbb7935c3b3a229db601f734125"
  },
  "scorecard": {
    "version": "4.0.1-86-geb0730a",
    "commit": "eb0730ae797377d24676cdedfa27216f0d649091"
  },
...
    {
      "details": null,
      "score": 4,
      "reason": "GitHub code reviews found for 13 commits out of the last 30 -- score normalized to 4",
      "name": "Code-Review",
      "documentation": {
        "url": "https://github.com/ossf/scorecard/blob/eb0730ae797377d24676cdedfa27216f0d649091/docs/checks.md#code-review",
        "short": "Determines if the project requires code review before pull requests (aka merge requests) are merged."
      }

Feb 14 '22 18:02 evverx

re: search API in repoClient @azeemshaikh38 do we need this API at all or can we just let checks use the file listing API https://github.com/ossf/scorecard/blob/main/checks/fileparser/listing.go?

It would be nice to support a generic Search API, especially for cases which require searching large repos like oss-fuzz like we do today. Having said that, I'm open to trying out ListFiles and GetFileContent APIs for solving SAST checks if that gets us more consistency.

Feb 14 '22 18:02 azeemshaikh38

Also it seems scorecard (with --verbosity Debug --show-details) no longer shows useful logs that can be used to somewhat dispute its findings. I think it would be great if they could be brought back given the number of false positives it produces mainly due to the GH API as far as I can tell.

Feb 14 '22 18:02 evverx

Could it be that because of that the "code review" check is kind of broken as well?

Code-Review does not use GitHub search APIs. It looks like #1505 changed the behavior here. Commits before that, give systemd/systemd a score of 10 for Code-Review. @laurentsimon could you look into this?

Feb 14 '22 18:02 azeemshaikh38

Commits before that, give systemd/systemd a score of 10 for Code-Review

I think commits before that are affected by https://github.com/ossf/scorecard/issues/1260 so it doesn't always work as expected either. Anyway, without "debug" logs it's hard to tell what scorecard does and why.

Feb 14 '22 19:02 evverx

Regarding logs it appears something like that was reported in https://github.com/ossf/scorecard/issues/95 and then it got a bit better and now those logs seem to be gone again.

Feb 14 '22 19:02 evverx

Commits before that, give systemd/systemd a score of 10 for Code-Review

I think commits before that are affected by #1260 so it doesn't always work as expected either. Anyway, without "debug" logs it's hard to tell what scorecard does and why.

Ok, I think I know whats going on. Some PRs are repeated for systemd commits. Its as if the same PR number gets re-used and latest PR data (including reviews) overwrites old data. Here's an example:

{
              "oid": "d74da762a3b1edbb7935c3b3a229db601f734125",
              "associatedPullRequests": {
                "nodes": [
                  {
                    "repository": {
                      "owner": {
                        "login": "systemd"
                      },
                      "name": "systemd"
                    },
                    "number": 22506,
                    "title": "gpt-auto: some (primarily cosmetic) fixes to backing block device detection in gpt-auto-generator/sd-device",
                    "mergeCommit": {
                      "oid": "d74da762a3b1edbb7935c3b3a229db601f734125"
                    }
                  }
                ]
              }
            },
{
              "oid": "d5cb053cd93d516f516e0b748271b55f9dfb3a29",
              "associatedPullRequests": {
                "nodes": [
                  {
                    "repository": {
                      "owner": {
                        "login": "systemd"
                      },
                      "name": "systemd"
                    },
                    "number": 22506,
                    "title": "gpt-auto: some (primarily cosmetic) fixes to backing block device detection in gpt-auto-generator/sd-device",
                    "mergeCommit": {
                      "oid": "d74da762a3b1edbb7935c3b3a229db601f734125"
                    }
                  }
                ]
              }
            },

2 different commits have the same associatedPR, while the PR data itself reflects the latest change. This is why Scorecard finds unreviewed commits, because there is no review data associated with these commits.

The real question is why is this behavior happening on systemd at all? @evverx do you have any clues?

Feb 14 '22 20:02 azeemshaikh38

scorecard scorecard copied to clipboard

✨ Support more SAST tools

scorecard
scorecard copied to clipboard