scorecard icon indicating copy to clipboard operation
scorecard copied to clipboard

BUG: invalid number of commits reported

Open laurentsimon opened this issue 3 years ago • 7 comments

Running scorecard on https://github.com/danielaparker/jsoncons/commits/master reports

"Info: all commits (3) are checked with a SAST tool

Thera are a lot more commits in the repo, though. I would have expected Scorecard to look at 30 commits (at least that's how it used to work)

laurentsimon avatar Jul 01 '22 21:07 laurentsimon

I'm running at HEAD. This seems to be a problem on most repos I'm testing. Here's another one github.com/Dart-Code/dart-code, reporting a single commit.

laurentsimon avatar Jul 01 '22 21:07 laurentsimon

Scorecard still looks at 30 commits (or whatever commitsToAnalyze is set to). The number being reported refers to the number of commits in the 30 pulled which have a PR with non-zero merge time. This check changed in eeb563be, which may have unintentionally modified the behavior.

if pr.MergedAt.IsZero() {
	continue
}
totalMerged++

The exact numbers depend on the repo when run. When I run against the repo you mentioned today, 30 commits are analyzed and 14 aren't, so I get the following:

all commits (16) are checked with a SAST tool

Looking back in time to when you made the issue, there were a lot of commits which didn't have associated PR's (or at least nothing shows up after master like it does for other commits: one two etc ...

I think clarifying expected behavior is good here. Should we expect Scorecard to look at 30 commits overall, or 30 commits which get included in the analyses?

spencerschrock avatar Jul 18 '22 17:07 spencerschrock

30 commits overall is fine, and that's what we've been doing. I think I got thrown off by the message. It should say X out of Y commits are checked with a SAST tool

I recall there were some problems with graphQL where we may be looking at fewer (than 30 ) commits if there are not enough commits in the lest 3 (?) months or so.

@azeemsgoogle do you remember the exact problem and whether it has been resolved?

laurentsimon avatar Jul 18 '22 18:07 laurentsimon

IIUC, the problem here is that some repos do not squash when they merge a PR. So when a PR with 5 commits gets merged, each of those commits are returned in ListCommits. But the number of PRs associated with these commits is just 1. So checks like SAST, CI-Tests which care about PRs rather than commits seem off.

This is actually a bug IMO (like if a PR had 30 commits, we'd only be analyzing one PR) and we need to fix it. One soln I can think of is to increase commitsToAnalyze to 100 and ignore any commit which does not match its associatedPR's merge commit. Its not foolproof, but probably better than what we have today. Thoughts?

azeemshaikh38 avatar Aug 23 '22 21:08 azeemshaikh38

Tough one. Can the "squashed" status be queried?

laurentsimon avatar Aug 23 '22 21:08 laurentsimon

Can the "squashed" status be queried?

Not AFAIK

azeemshaikh38 avatar Aug 24 '22 01:08 azeemshaikh38

:/ If we find that we're not analyzing enough commits, can we re-query? That starts to become a bit complicated, though

laurentsimon avatar Aug 24 '22 16:08 laurentsimon

This issue seems keeps creeping up. So here's my proposal to fix this:

  1. Update RepoClient.ListCommits() API to return an CommitIter object instead of []Commits.
  2. Every Scorecard check chooses how many commits it cares about analyzing. So in case of Code-Review or CI-Tests checks it'll keep retrieving commits until it reaches 30 (or the required) changesets.

So basically, instead of RepoClient choosing how many commits/PRs to analyze, it'll be the checks themselves.

azeemshaikh38 avatar Mar 14 '23 16:03 azeemshaikh38