octokit.net
octokit.net copied to clipboard
IssuesEventsClient.GetAllForRepository behavior
I have two repos that are being serviced by an implementation of OctoKit 0.20. One represents ~600 issues and is actively used. The other is a test bed for pre-production testing.
Around 6am PDT, OctoKit started reporting this and failing to complete an IIssuesEventsClient.GetAllForRepository() call:
Unhandled Exception: System.AggregateException: One or more errors occurred. ---> Octokit.ApiValidationException: In order to keep the API fast for everyone, pagination is limited for this resource. Check the rel=last link relation in the Link response header to see how far back you can traverse. at Octokit.Connection.HandleErrors(IResponse response) in G:\Islandwood\octokit.net\Octokit\Http\Connection.cs:line 5 63
on our production system. We hadn't made any changes to the OctoKit implementation that would affect this.
Parsing the error, I might expect our nine-month old production repo to start experiencing a sunset. However, the test repo has only existed for about two months and also gets this error when I attempt to pull its events. However, there are only 206 events at this point!
According to: http://octokitnet.readthedocs.io/en/latest/extensibility/
...default pagination should deliver 30 events at a time. I'm seeing significantly more calls than that; the number of calls went from 6 to over a thousand - eventually ending with the error above. My suspicion is that the error above is a product of it attempting to pull a page that is over a particular limit.
This call has been working fine up until this morning. While I know I can simply call individual pages until the events equal zero, I'd really prefer to continue using GetAllForRepository without ApiOptions.https://gist.github.com/ryangribble/c91ea7ca54cff6907a5c2cb8025f3579
someone mentioned something on gitter chat that there may have been some changes to Api rate limits from memory.
Have you tried passing an ApiOptions to the GetAll call and specifying a high records per page size to override whatever the default is? That way if the default was changed to something lower, but it still allows you to request more per page, you might be OK.
In general I was bumping up against some of the same types of issues, where as repositories grow older/larger, our query performance (mainly around generating release notes between any 2 arbitrary points) started to suffer. I was able to make some efficiency improvements with some of my queries (eg instead of loading all PR's for the repo I used the issue search API to find issues of type PR that are merged between the commit timestamps of the from/to commits I wanted release notes for) and then load those ones with individual Get calls but in parallel... this scales much better in a repo with growing numbers of PRs.
So in your particular use case, what is the outcome you are trying to achieve? There may be ways we can suggest to make your code more scalable as repositories grow larger, rather than always retrieving every single issue event since the beginning of time.
Eg another thing I do for my release notes I need to know the merge commit of the PR's, which you can only find from the "merge" event. Originally I used to get all issue event for the whole repo, but this grew too slow (i didnt run into your issue, just really lengthy query times) so with the above optimisations I now only run this query for the singular issues (PRs) i know i am interested in and by using Tasks to run things in parallel, achieved a pretty scalable query that is only affected by the number of PRs in a release, rather than the size of the repo
Or if you are using the issue events to know "what is happening" in the repo, then you could instead configure webhooks to hit your service so you get told of each event as it happens, and can process them in the singular context rather than "all events for the whole repo"
Here's a gist showing the approach I mentioned - using parallel tasks to load a smaller set of individual Get calls, rather than using GetAll calls. https://gist.github.com/ryangribble/c91ea7ca54cff6907a5c2cb8025f3579
A couple of questions I have:
- Are you capturing the rate limiting information when you encounter this error? I'd love to confirm that is reflecting reality so that clients know when they are able to reconnect to the resource.
Here's a quick snippet:
var github = new GitHubClient(...);
...
var rateLimit = github.GetLastApiInfo().RateLimit;
- are there other resources you're encountering this on, or is it only Issue Events for Repository?
Brendan, I did in fact collect this data in the process of troubleshooting.
This is from memory and reflects what I saw on Friday.
-
Issues GetAllFromRepository() can be done without additional ApiOptions added. -
Comments GetAllFromRepository() gets stuck in the loop that I see if ApiOptions are not set. However, if I provide a number of pages larger than the number needed as part of the ApiOptions, it only uses the appropriate number of calls to the GitHub API. -
Events GetAllFromRepository() gets stuck in the loop that I see if ApiOptions are not set. If I tell the call to get 100 pages – even if the number of pages needed are more than that – it will use 100 calls of my account’s 5000/hr to get that information…but return the correct number of events.
I did not observe other GetAllFromRepository() calls.
When I let the Events GetAllFromRepository() gather without bounds, it will send off ~1100 calls to GitHub, then respond with the error that I saw.
Ryan, good questions. I will get back to you when I get into work on Monday.
@ryangribble - we use Octokit to push GitHub issue data into our internal bug tracking system, allowing those who are looking at larger trends within the organization to not have to jump over to GitHub to understand our project.
GH issues, comments, and events are used to update our tracking bugs. Because GetAllFromRepository's ApiOptions only cover page size and number of entries, the initial version of the code asked for the entire repo, then parsed the output down to only the updated comments. Not exactly efficient. Our repo is still small enough to make this possible. I was looking at better ways to get the work done before the issue I mention above showed up on Friday.
What would be ideal would be to ask only for the comments or events that take place since a certain timestamp (the last hour of comments, for example). It'd be useful to have a similar RepositoryRequest for comments and events as we do for issues.
As you point out, some of this could also be done using RepositoryIssueRequest {Since = ...} and just getting events or comments for those issues. However, it'd also be useful to not have the API misbehave and use too many calls for a particular event request.
👋 Hey Friends, this issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Please add the Status: Pinned label if you feel that this issue needs to remain open/active. Thank you for your contributions and help in keeping things tidy!