scorecard icon indicating copy to clipboard operation
scorecard copied to clipboard

Bug Fix Request: Re-enable 3 checks in cron job

Open Parth59 opened this issue 3 years ago • 10 comments

Describe the bug Currently, the CRON pushes scores of 16 checks to bigQuery and does not include scores for the following 3 checks CI_TEST, SAST, and CONTRIBUTOR.

Expected behavior We expect all the scores to be available in the processed data dump. If the scorecards application was unable to compute it then possibly assigning a score value like -1 or any other pre-defined value can help us ignore those scores.

Additional context I can think of the following 2 reasons -

A) There was a design decision taken to exclude these metrics. That's might be one of the reasons why previous issues that asked to re-enable this https://github.com/ossf/scorecard/issues/1302 were put to backlog.

B) This has been missed out to include somehow. If that is the case, can we enable those now?

Parth59 avatar May 16 '22 16:05 Parth59

Some of these checks were taken out because they are API intensive so rate limiting can be a bottleneck. @azeemsgoogle any plan to add these checks?

laurentsimon avatar May 16 '22 22:05 laurentsimon

Disabling these checks is a conscious decision for being able to scale the weekly job (see https://github.com/ossf/scorecard/blob/main/cron/config/config.yaml#L27). We need to optimize these checks for API usage before we enable them in the cron job. As of now, I don't plan to work on adding these back. Happy if someone wants to take this on.

We should probably explicitly call out in our documentation though that not all checks are run during the weekly cron job.

azeemshaikh38 avatar May 17 '22 16:05 azeemshaikh38

We could add an extra column "Run in weekly scan" in the https://github.com/ossf/scorecard#scorecard-checks. @Parth59 interested in sending a quick PR for it?

laurentsimon avatar May 17 '22 17:05 laurentsimon

I can take a look at this

spencerschrock avatar Aug 23 '22 18:08 spencerschrock

My plan is to expand the graphQL API call to fetch all the check info in the initial API call. Something like this:

query {
  repository(owner:"ossf", name:"scorecard") {
    object(expression: "heads/main") {
      ... on Commit {
        checkSuites(first: 10) {
          edges {
            node {
              app{
                slug
              }
              conclusion
              status
            }
          }
        }
      }
    }
  }
}

which will produced the same info used by Client.Checks.ListCheckRunsForRef

{
  "node": {
    "app": {
      "slug": "dependabot"
    },
    "conclusion": null,
    "status": "QUEUED"
  }
}

spencerschrock avatar Aug 23 '22 20:08 spencerschrock

In CI-Tests, the issue comes from this for loop - https://github.com/ossf/scorecard/blob/2b206dc3654ab96b88310625c5f0a1a3902723e7/checks/ci_tests.go#L53-L54

We basically need to make the ListChecksRunsForRef call for all commits returned by ListCommits which is what makes this check expensive.

The other issue is - we call ListCheckRunsForRef and ListStatuses on pr.HeadSHA, not on the merge commitSHA. If we were to look for the statuses on the merge commit it changes the meaning of the check - instead of checking whether CI-Tests ran before a PR got merged, we look for whether CI-Tests ran after the PR got merged.

azeemshaikh38 avatar Aug 23 '22 20:08 azeemshaikh38

We basically need to make the ListChecksRunsForRef call for all commits returned by ListCommits which is what makes this check expensive

Yep. That's what I was hoping to avoid by expanding graphqlData to include checkSuites so we can get it all in one call instead of 30 (or whatever commitsToAnalyze is set) API calls. I think this would still increase the cost of the query though. I'll play around with if this is an improvement or not.

instead of checking whether CI-Tests ran before a PR got merged, we look for whether CI-Tests ran after the PR got merged

I'll keep this in mind to make sure to preserve existing behavior

spencerschrock avatar Aug 23 '22 21:08 spencerschrock

I think this would still increase the cost of the query though. I'll play around with if this is an improvement or not.

Yes, although graphQL cost might still be better than 30x REST API calls. You can use the e2e tests to understand how a change affects the query cost - https://github.com/ossf/scorecard/blob/main/clients/githubrepo/graphql_e2e_test.go. Note that some increase in cost is ok and expected.

azeemshaikh38 avatar Aug 23 '22 21:08 azeemshaikh38

Luckily the Github GraphQL API Explorer also provides cost info. Getting the checks run on normal commits is pretty cheap (cost: 1). ~~The way I'm getting the checks for each pr.HeadSHA is not (currently cost: 18, but still experimenting).~~ Had a misconfiguration. Continuing under a cost: 1 scenario

spencerschrock avatar Aug 23 '22 22:08 spencerschrock

The SAST check has been enabled as part of #2223. The next BigQuery cron job to pick it up will be the 9/5/22 run and the data should be available by 9/12/22

spencerschrock avatar Sep 01 '22 17:09 spencerschrock

SAST fixed; still need to address Contributor, CI Test, Dependency Update

lasomethingsomething avatar Apr 04 '24 20:04 lasomethingsomething