gharchive.org icon indicating copy to clipboard operation
gharchive.org copied to clipboard

only three issuesevent action type

Open bluecoco opened this issue 7 years ago • 8 comments
trafficstars

On the api documentation page, https://developer.github.com/v3/activity/events/types/#issuesevent, it says the payload.action type for issues event can be one of "assigned", "unassigned", "labeled", "unlabeled", "opened", "edited", "milestoned", "demilestoned", "closed", or "reopened". But when we look at the past few years data, this value is only one of the three 'open', 'closed' and 'reopened'. Are the other action types not captured by the GHArchive? Thanks a lot!

bluecoco avatar Apr 09 '18 18:04 bluecoco

Someone else flagged this to me too recently. Looks like the Events API may be surfacing a subset of issue transitions.. @annafil could you sanity check on your end?

igrigorik avatar Apr 10 '18 16:04 igrigorik

Yes, that's correct @igrigorik.

This is a limitation on the API side, not GHArchive. While the documentation for the API says these events are surfaced, they are not available in the /events endpoint that GHArchive reads from -- only through webhooks. Will ask for this to be clarified in the docs :)

annafil avatar Apr 10 '18 16:04 annafil

@annafil would it be possible to ask in reverse, and add those events to the API? I've heard a few requests for this now.. :)

igrigorik avatar Apr 10 '18 19:04 igrigorik

If it helps, even though the /events stream doesn't include these types of events by default, they are currently available in a slightly different form from the API :)

Each issue event has a unique API URI, and contains the additional issue activity types above. As far as I can see historical information is still available for those events: e.g. this event from the very active rails/rails circa 2011 that also includes the 'assigned' event: https://api.github.com/repos/rails/rails/issues/411/events. It should therefore be possible to reconstruct activity for issues of particular interest if the repo and issue have not been deleted.

@igrigorik We could consider updating the crawler to fetch these related events whenever it encounters an issue, to attempt to preserve the historical data around issues better, but I defer to you on whether this is in scope for Archive :)

annafil avatar Apr 10 '18 20:04 annafil

Thank you both for your help!

bluecoco avatar Apr 11 '18 01:04 bluecoco

@igrigorik We could consider updating the crawler to fetch these related events whenever it encounters an issue, to attempt to preserve the historical data around issues better, but I defer to you on whether this is in scope for Archive :)

How would you see that working? Trigger the extra fetch when an issue is "closed" to backfill? A couple of gotchas that come to mind

  • Presumably issues can be updated even after they're updated, right? We would still miss data.
  • Today the activity is logged into the archive when it is detected, so the fetched data would be "misaligned" with the rest, and backfilling into old gzip archives and BQ tables would add a ton of complexity.
  • We're already up against the API limit. More fetches might make us lose more activity data.

igrigorik avatar Apr 12 '18 21:04 igrigorik

Similarly, it seems like for pull request events, only 'opened', 'closed', and 'reopened' are captured, not others such as 'assigned', 'unassigned', is it also expected?

bluecoco avatar Apr 16 '18 15:04 bluecoco

@igrigorik Very good point about backfilling to the gzip archives and the added complexity. I agree with you that the API limit is a concern, and a big blocker to grabbing more of this data in some systematic way. I suspect one of the reasons these additional events are not available through the /events endpoint is because they're relatively higher in volume than open/closed/reopened events and would make it harder to keep up with the feed.

@bluecoco good question! I would expect a consistent set of events to be put out for both PRs and Issues, so this seems right to me.

annafil avatar Apr 20 '18 21:04 annafil