terraform-aws-github-runner icon indicating copy to clipboard operation
terraform-aws-github-runner copied to clipboard

fix: Add delay to prevent ssm rate limits if instance count is greater than max throughput

Open devsh4 opened this issue 3 years ago • 6 comments

Fix for bug #1841 which prevents hitting AWS SSM rate limits by adding a delay between subsequent putParameter calls.

Note: 25ms delay is based on the AWS service quotas as they only allow 40 requests per second to the parameter store by default.

[Tests]:

  • Pre commit checks ✅
  • Ran build locally ✅
  • Ran linter ✅
  • Ran jest tests ✅

Related PR (never got merged): https://github.com/philips-labs/terraform-aws-github-runner/pull/1843

devsh4 avatar Aug 02 '22 16:08 devsh4

@npalm Looks like the jest step is failing because the coverage is dropping below 92%, however all the tests itself are passing.

Jest: "global" coverage threshold for branches (92%) not met: 91.37%

devsh4 avatar Aug 02 '22 18:08 devsh4

Guess this is caused since condional delay black is not touched in the test. This requires to add a test with adding 40 runners. Right now not more time to dig in further. Will check later.

npalm avatar Aug 03 '22 18:08 npalm

This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Sep 03 '22 02:09 github-actions[bot]

@npalm Any chance you can look at this? I tried adding some tests with 40 runners but it didn't help.

devsh4 avatar Sep 03 '22 03:09 devsh4

Yes will do. But will be after next week.

npalm avatar Sep 03 '22 07:09 npalm

@GuptaNavdeep1983 would you have tim to have a look on this PR as well?

npalm avatar Sep 16 '22 15:09 npalm

Any update on this? Looking to upgrade the module version but cannot until we have this PR merged in.

devsh4 avatar Oct 04 '22 17:10 devsh4

Sorry no updates, bit restricted by time. Sorry.

npalm avatar Oct 05 '22 10:10 npalm

@devsh4 can you have a look at the recent changes?

GuptaNavdeep1983 avatar Oct 07 '22 15:10 GuptaNavdeep1983

Ther refered AWS document only mention GetParameters to be rate limitted. The Lambda only calls putParatameter. I can't see a specific limit for the putParamaters. Hoever I have seen RateLimits as well when we try to scale aggressive. AWS also used a bucket to rate limite. As soon you bucket is empty ou got rate limitted. The moudle makes several AWS calls. PutParameter wil certainly tak a part in this.

But not sure if delaying the SSM put parameter calal will solve the problem. Once solving the proble here, maybe call the GetParamter from the instance start can be the next limit.

npalm avatar Oct 10 '22 11:10 npalm

AWS service quotas

@npalm yes I believe putParameter call does play a role in hitting rate limits. We are running the forked version of this repo with the initial change I made here and it has been running seamlessly since the past 6 months for spinning up more than ~250 runners at once without hitting rate limits.

devsh4 avatar Oct 11 '22 22:10 devsh4

AWS service quotas

@npalm yes I believe putParameter call does play a role in hitting rate limits. We are running the forked version of this repo with the initial change I made here and it has been running seamlessly since the past 6 months for spinning up more than ~250 runners at once without hitting rate limits.

Can you refer to the fork you running here (in case it is OS)?

npalm avatar Oct 12 '22 06:10 npalm

@npalm Sure forked version is here, it's linked above as well. FYI we are using Linux runners.

devsh4 avatar Oct 14 '22 00:10 devsh4

@npalm Any update on this?

devsh4 avatar Nov 08 '22 15:11 devsh4

This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Dec 09 '22 02:12 github-actions[bot]