scorecard
scorecard copied to clipboard
Bug Fix Request: Re-enable 3 checks in cron job
Describe the bug Currently, the CRON pushes scores of 16 checks to bigQuery and does not include scores for the following 3 checks CI_TEST, SAST, and CONTRIBUTOR.
Expected behavior We expect all the scores to be available in the processed data dump. If the scorecards application was unable to compute it then possibly assigning a score value like -1 or any other pre-defined value can help us ignore those scores.
Additional context I can think of the following 2 reasons -
A) There was a design decision taken to exclude these metrics. That's might be one of the reasons why previous issues that asked to re-enable this https://github.com/ossf/scorecard/issues/1302 were put to backlog.
B) This has been missed out to include somehow. If that is the case, can we enable those now?
Some of these checks were taken out because they are API intensive so rate limiting can be a bottleneck. @azeemsgoogle any plan to add these checks?
Disabling these checks is a conscious decision for being able to scale the weekly job (see https://github.com/ossf/scorecard/blob/main/cron/config/config.yaml#L27). We need to optimize these checks for API usage before we enable them in the cron job. As of now, I don't plan to work on adding these back. Happy if someone wants to take this on.
We should probably explicitly call out in our documentation though that not all checks are run during the weekly cron job.
We could add an extra column "Run in weekly scan" in the https://github.com/ossf/scorecard#scorecard-checks. @Parth59 interested in sending a quick PR for it?
I can take a look at this
My plan is to expand the graphQL API call to fetch all the check info in the initial API call. Something like this:
query {
repository(owner:"ossf", name:"scorecard") {
object(expression: "heads/main") {
... on Commit {
checkSuites(first: 10) {
edges {
node {
app{
slug
}
conclusion
status
}
}
}
}
}
}
}
which will produced the same info used by Client.Checks.ListCheckRunsForRef
{
"node": {
"app": {
"slug": "dependabot"
},
"conclusion": null,
"status": "QUEUED"
}
}
In CI-Tests, the issue comes from this for loop - https://github.com/ossf/scorecard/blob/2b206dc3654ab96b88310625c5f0a1a3902723e7/checks/ci_tests.go#L53-L54
We basically need to make the ListChecksRunsForRef call for all commits returned by ListCommits which is what makes this check expensive.
The other issue is - we call ListCheckRunsForRef and ListStatuses on pr.HeadSHA, not on the merge commitSHA. If we were to look for the statuses on the merge commit it changes the meaning of the check - instead of checking whether CI-Tests ran before a PR got merged, we look for whether CI-Tests ran after the PR got merged.
We basically need to make the ListChecksRunsForRef call for all commits returned by ListCommits which is what makes this check expensive
Yep. That's what I was hoping to avoid by expanding graphqlData to include checkSuites so we can get it all in one call instead of 30 (or whatever commitsToAnalyze is set) API calls. I think this would still increase the cost of the query though. I'll play around with if this is an improvement or not.
instead of checking whether CI-Tests ran before a PR got merged, we look for whether CI-Tests ran after the PR got merged
I'll keep this in mind to make sure to preserve existing behavior
I think this would still increase the cost of the query though. I'll play around with if this is an improvement or not.
Yes, although graphQL cost might still be better than 30x REST API calls. You can use the e2e tests to understand how a change affects the query cost - https://github.com/ossf/scorecard/blob/main/clients/githubrepo/graphql_e2e_test.go. Note that some increase in cost is ok and expected.
Luckily the Github GraphQL API Explorer also provides cost info. Getting the checks run on normal commits is pretty cheap (cost: 1). ~~The way I'm getting the checks for each pr.HeadSHA is not (currently cost: 18, but still experimenting).~~
Had a misconfiguration. Continuing under a cost: 1 scenario
The SAST check has been enabled as part of #2223. The next BigQuery cron job to pick it up will be the 9/5/22 run and the data should be available by 9/12/22
SAST fixed; still need to address Contributor, CI Test, Dependency Update