Use `author` instead of `actor_login` where possible for commit events
What is it?
Our current contributor metric calculates counts based on the actor_login field. However, GitHub’s approach uses the author_login, which can lead to discrepancies. In many commit events, the user who authors the commit differs from the user who pushes the commit, so switching to (or including) author_login should align our metrics more closely with GitHub's.
Background:
- Rohit and I investigated the Gitcoin contributor metric and compared our event model against staging data and GitHubKit data.
- We observed that using only
actor_logincan undercount or misrepresent contributor activity, especially when the commit author and pusher differ. - Previous research (e.g., by @ravenac95 ) indicated that this nuance is significant and might explain why our numbers differ from GitHub's.
Proposed Changes:
- Modify the logic for counting commit events to use
author_logininstead ofactor_login. - Include a fallback mechanism in cases where
author_loginis not available or lacks useful data (referencing historical CASE WHEN conditions). - Validate the revised metric to ensure that our contributor counts align with GitHub’s numbers.
See analysis examples here
@kenyiu would you be able to do some exploring into this?
From this, it's true that we should use commits[][author] for the git author of the commit, but not the actor from the event object if we want to calculate like github.
https://docs.github.com/en/account-and-profile/setting-up-and-managing-your-github-profile/managing-contribution-settings-on-your-profile/why-are-my-contributions-not-showing-up-on-my-profile#your-local-git-commit-email-isnt-connected-to-your-account
Then, i looked into what's in the commits[][author]. Sadly, there are only name and email, but not login. I also verified it's true from what we had on production
https://docs.github.com/en/rest/using-the-rest-api/github-event-types?apiVersion=2022-11-28&versionId=free-pro-team%40latest&category=repos#event-payload-object-for-pushevent
I checked how the others link the email with the username, n should be the following api endpoint https://docs.github.com/en/rest/search/search?apiVersion=2022-11-28#search-users
Since GH Archive only provides events, we would not have access to the login if not doing extra queries i believe.