create-pull-request icon indicating copy to clipboard operation
create-pull-request copied to clipboard

Dealing with GitHub's rate limiting

Open marten-seemann opened this issue 4 years ago • 14 comments

First of all, thank you for this awesome GitHub Action! We're using it to distribute and synchronize workflows across hundreds of repositories (libp2p, IPFS, Filecoin).

When deploying an update, we create hundreds of PRs practically at the same moment, and (understandably) GitHub is not entirely happy about that: it triggers their abuse detection mechanism.

Apparently, there's a Retry-After header that one could use to wait and automatically retry the request: https://docs.github.com/en/rest/guides/best-practices-for-integrators#dealing-with-abuse-rate-limits. Any thoughts on implementing a retry function based on this header?

marten-seemann avatar May 30 '21 04:05 marten-seemann

Hi @marten-seemann

Glad you are finding the action useful.

I'll have a go at implementing this. I think it can be done quite easily by leveraging octokit's retry plugin. The plugin appears to respect the Retry-After header in responses.

peter-evans avatar May 31 '21 00:05 peter-evans

I've added the retry octokit plugin in a feature branch. The default settings are for it to retry up to 3 times, while respecting the Retry-After header. It would be very helpful if you could try this out to make sure it works in your case. You can try the version of the action in the open pull request by changing the action version to @retry.

        uses: peter-evans/create-pull-request@retry

peter-evans avatar May 31 '21 00:05 peter-evans

Hi @peter-evans, thank you for this super quick reply and the implementation!

I just tried it out, and it looks like we're still running into GitHub's abuse detection mechanism, for example here: https://github.com/protocol/.github/runs/2720817821?check_suite_focus=true. I don't see any log output that would indicate that a retry is happening, but maybe I'm missing something?

marten-seemann avatar Jun 01 '21 17:06 marten-seemann

Ah, I see where the problem is now. The abuse detection is kicking in when the branch is being pushed to the repository with git push, not the calls to the GitHub API to create the pull request as I first thought. The plugin I added only works for the GitHub API calls, not the git operations. Let me investigate how best to retry the git operations.

peter-evans avatar Jun 01 '21 23:06 peter-evans

@marten-seemann I've added logic to retry the git push command. Unfortunately, I don't think there is any way to see the Retry-After header from the git response, but I think the default value is 60 seconds. So I've hardcoded the wait time to 60 seconds, plus up to 10 seconds of jitter.

Let's see if this resolves the problem. If the command is retried it should appear in the logs.

peter-evans avatar Jun 02 '21 06:06 peter-evans

@marten-seemann I'm periodically checking the runs here to see if there have been any retries during the "Deploy" workflow, but not seen any runs for a while. I'll wait until we can confirm that this solution works before merging it in.

peter-evans avatar Jun 11 '21 23:06 peter-evans

@peter-evans We only do runs that deploy to all ~100 repositories infrequently - it creates a lot of noise. The next run will probably be adding Go 1.17 (which will be released in August), unless something urgent comes up before. Can we keep this issue open until then?

marten-seemann avatar Jun 16 '21 01:06 marten-seemann

@marten-seemann Sure, no problem. I'm happy to wait until we can confirm that the PR changes work well.

peter-evans avatar Jun 16 '21 01:06 peter-evans

Hi @peter-evans, thank you for your patience!

We did another deployment today, and we ran into rate limits on a large number of jobs, for example here: https://github.com/protocol/.github/runs/3351001762?check_suite_focus=true. I'm not sure if did retry anything, but judging from the execution time of the step (2s) it probably didn't. Do you have any idea why?

marten-seemann avatar Aug 17 '21 13:08 marten-seemann

@marten-seemann

I've looked through all the runs and what is interesting this time is that none of them triggered abuse detection for git push of the PR branch, which was the case previously. Such as this run from June: https://github.com/protocol/.github/runs/2720817821?check_suite_focus=true

All of the failures are from the GitHub API call to create the PR, returning this error:

Error: You have exceeded a secondary rate limit and have been temporarily blocked from content creation. Please retry your request again later.

I found an explanation of the secondary rate limits: https://docs.github.com/en/rest/overview/resources-in-the-rest-api#secondary-rate-limits

These are not the standard rate limits, but additional limits on certain actions to prevent abuse. You can see the response example returns 403 Forbidden. This HTTP code is, by default, not retryable by the plugin-retry.js Octokit plugin. I didn't realise that, and that is why it didn't retry any of the requests. I've updated the retry feature branch to allow retrying 403 error responses.

There is some further information here: https://docs.github.com/en/rest/guides/best-practices-for-integrators#dealing-with-secondary-rate-limits

  • Make requests for a single user or client ID serially. Do not make requests for a single user or client ID concurrently.
  • If you're making a large number of POST, PATCH, PUT, or DELETE requests for a single user or client ID, wait at least one second between each request.
  • Requests that create content which triggers notifications, such as issues, comments and pull requests, may be further limited and will not include a Retry-After header in the response. Please create this content at a reasonable pace to avoid further limiting.

The first two points above are why I think you are being caught by the abuse detection. You are using a PAT created on one user for all the requests, so some are being executed concurrently and are not 1 second apart. There is not much I can do about this in the action other than retry a few times. In your case there is no Retry-After header, either.

You might want to think about redesigning your workflows to execute serially, instead of in parallel. Or, perhaps use multiple PATs created on different users.

peter-evans avatar Aug 18 '21 04:08 peter-evans

@marten-seemann Please could you let me know when you run another deployment. I would like to check if 403 error responses are being successfully retried.

peter-evans avatar Sep 01 '21 01:09 peter-evans

Hi @peter-evans, first of all, thanks again for all your work! It will probably be a while before we run another deployment. Deploying to 150 repos creates a lot of noise.

The first two points above are why I think you are being caught by the abuse detection. You are using a PAT created on one user for all the requests, so some are being executed concurrently and are not 1 second apart.

I think you're right. We might have to change the script to be a little bit less aggressive here.

marten-seemann avatar Sep 05 '21 15:09 marten-seemann

random remark: Octokit's throttling plugin was broken for some weeks due to a change in the error message sent upon hitting the abuse rate - see https://github.com/octokit/plugin-throttling.js/issues/437

dontcallmedom avatar Sep 23 '21 10:09 dontcallmedom

@dontcallmedom Thanks. Good to know! It makes sense now why the message changed.

peter-evans avatar Sep 23 '21 12:09 peter-evans

I was also hitting the rate limit issue, I will try this branch to see if it helps!

villelahdenvuo avatar Oct 12 '22 13:10 villelahdenvuo

@villelahdenvuo I don't recommend using the retry branch of this action. It's very old now and has missed some important updates. If you are hitting the rate limit then there are probably things you should do in your workflows to slow down processing.

@marten-seemann Are you still using this branch?

I think I need to revisit the code in this branch and decide whether or not to merge some of it into main. I have a feeling that not all the code in this branch was working as intended and/or didn't really make a difference.

peter-evans avatar Oct 21 '22 07:10 peter-evans

Our workflow just failed with this error:

Create or update the pull request
  Attempting creation of pull request
  Error: API rate limit exceeded for user ID 111111111.

where 111111111 was some number

paulz avatar Oct 31 '22 02:10 paulz

re-running the failed scheduled workflow seems to work.

do not understand how can we be rate limited if we use our own repo scoped token

 token: ${{ secrets.REPO_SCOPED_TOKEN }}

paulz avatar Oct 31 '22 02:10 paulz

@paulz Using a PAT doesn't stop you from being rate-limited. I think GitHub are just more generous with the limits for authenticated user accounts. If you have used the same PAT across many workflows that are running simultaneously then I imagine you could hit the rate limit. (Or even multiple PATs associated with the same user account could also contribute to the same rate limit I think).

peter-evans avatar Oct 31 '22 02:10 peter-evans

im hitting secondary limits, and we arent doing much...maybe 1 PR a min. We specify the token as well, but it doesnt seem to be helping...also, this token has PLENTY of requests remaining

{
  "limit": 5000,
  "used": 21,
  "remaining": 4979,
  "reset": 1669931929
}

wonder if github is gonna put up a status soon ab some degradation.

jmmclean avatar Dec 01 '22 21:12 jmmclean

I actually ran into this issue myself recently when a lot of automated PRs were being created for tests.

Error: You have exceeded a secondary rate limit and have been temporarily blocked from content creation. Please retry your request again later.

I'm fairly sure this error is "unretryable" in the sense of the action being able to wait and retry it, because GitHub forces you to wait a considerable length of time. It's not rate-limiting where it will let the request go through after a few seconds. So it's not feasible for the action to wait for such a long time.

The conclusion in this comment still stands. If you run into this issue, then the answer is to redesign your workflows to either slow down, or use PATs created on multiple machine user accounts.

I'm going to delete the retry branch soon. So if you are still using it, please move to v4.

peter-evans avatar Dec 13 '22 06:12 peter-evans