oso icon indicating copy to clipboard operation
oso copied to clipboard

Use `author` instead of `actor_login` where possible for commit events

Open ccerv1 opened this issue 9 months ago • 1 comments

What is it?

Our current contributor metric calculates counts based on the actor_login field. However, GitHub’s approach uses the author_login, which can lead to discrepancies. In many commit events, the user who authors the commit differs from the user who pushes the commit, so switching to (or including) author_login should align our metrics more closely with GitHub's.

Background:

  • Rohit and I investigated the Gitcoin contributor metric and compared our event model against staging data and GitHubKit data.
  • We observed that using only actor_login can undercount or misrepresent contributor activity, especially when the commit author and pusher differ.
  • Previous research (e.g., by @ravenac95 ) indicated that this nuance is significant and might explain why our numbers differ from GitHub's.

Proposed Changes:

  • Modify the logic for counting commit events to use author_login instead of actor_login.
  • Include a fallback mechanism in cases where author_login is not available or lacks useful data (referencing historical CASE WHEN conditions).
  • Validate the revised metric to ensure that our contributor counts align with GitHub’s numbers.

See analysis examples here

ccerv1 avatar Apr 11 '25 02:04 ccerv1

@kenyiu would you be able to do some exploring into this?

ccerv1 avatar Jun 23 '25 16:06 ccerv1

From this, it's true that we should use commits[][author] for the git author of the commit, but not the actor from the event object if we want to calculate like github. https://docs.github.com/en/account-and-profile/setting-up-and-managing-your-github-profile/managing-contribution-settings-on-your-profile/why-are-my-contributions-not-showing-up-on-my-profile#your-local-git-commit-email-isnt-connected-to-your-account

Then, i looked into what's in the commits[][author]. Sadly, there are only name and email, but not login. I also verified it's true from what we had on production https://docs.github.com/en/rest/using-the-rest-api/github-event-types?apiVersion=2022-11-28&versionId=free-pro-team%40latest&category=repos#event-payload-object-for-pushevent

I checked how the others link the email with the username, n should be the following api endpoint https://docs.github.com/en/rest/search/search?apiVersion=2022-11-28#search-users

Since GH Archive only provides events, we would not have access to the login if not doing extra queries i believe.

kenyiu avatar Jun 24 '25 16:06 kenyiu