shields
shields copied to clipboard
Error: Token pool is exhausted
Sentry Issue: SHIELDS-77
Error: Token pool is exhausted
File "/app/core/token-pooling/token-pool.js", line 268, in TokenPool._nextBatch
throw Error('Token pool is exhausted')
File "/app/core/token-pooling/token-pool.js", line 305, in TokenPool.next
token = this._nextBatch()
File "/app/services/librariesio/librariesio-api-provider.js", line 75, in LibrariesIoApiProvider.fetch
token = this.standardTokens.next()
File "/app/core/base-service/base.js", line 233, in LibrariesIoRepoDependencies._request
const { res, buffer } = await this._requestFetcher(url, options)
File "/app/core/base-service/base-json.js", line 45, in LibrariesIoRepoDependencies._requestJson
const { buffer } = await this._request({
...
(5 additional frame(s) were not displayed)
In the last month we have had 2 of these outages affecting libraries.io and bower badges. We end up in this state where we're throwing
https://github.com/badges/shields/blob/4fef1335d238a4f89a778e857d265b43273dd184/services/librariesio/librariesio-api-provider.js#L78-L80
as a result of throwing
https://github.com/badges/shields/blob/4fef1335d238a4f89a778e857d265b43273dd184/core/token-pooling/token-pool.js#L268
Once we end up in this state, we seem to be unable to recover from it even though both our tokens have plenty of rate limit available.
First job is to try and work out how this is happening and reproduce it. Is it a bug in our token pool code, or is there some externality causing it?
We used to have an admin endpoint for inspecting the token pool. Is it worth re-adding that? Or adding some periodic logging of the Libraries.io token pool?
I'll have a go at trying to reproduce it locally first - I've got a couple of ideas. If I'm stumped and we need to add some kind of debugging to try and catch it happening in production then we can.
I wouldn't be keen on re-introducing the admin endpoint as a debugging mechanism though, especially given the what/why/when behind the recent removal of that same endpoint.
Fundamentally I think the question boils down to what was summarized in the description: if the pool gets exhausted, can the running server recover the pool and begin using it again once sufficient time has past for the rate limits to be removed.
Perhaps a decent starting point would be to introduce a unit test in https://github.com/badges/shields/blob/master/core/token-pooling/token-pool.spec.js. AIUI we've got tests that cover standard happy paths and that an error is thrown upon exhaustion but I'm not seeing one that covers the "post exhaustion with recovery time" scenario. If we're able to reproduce the behavior with the unit test then that really identifies the root cause.
If we are able to get the next batch of tokens after exhaustion directly via a unit test then probably worth some bigger unit tests/component tests that exercise the relevant surface of the api provider class for librariesio (and probably github too for a frame of reference) without the underlying token pool being mocked
Also, I still think the easiest mitigation option is for a 3rd maintainer to add their Libraries API token to Heroku. We knew 1 token wouldn't suffice and that 2 might be stretched at times, but I'd think 3 tokens would give us more than enough breathing room (famous last words I know :sweat_smile:)
Chris and I have already copied ours in, and I used a token from a bot/service account in our CI environment so I'm out of GitHub accounts I can use
I' m using shields.io in a read.me file on github. But got the error Unable to select next Github token from pool
Try this url for example
I' m using shields.io in a read.me file on github. But got the error
Unable to select next Github token from pool
Try this url for example
@JohnyP36 that's #8907 and unrelated to this issue which is about Libraries.io badges
@calebcartwright thanks, I see.
So what's the solution to this problem? It still exists for me.
Follow https://github.com/badges/shields/issues/9839 for updates on the current libraries.io issue - it is caused by a change to the upstream API. I've put in a workaround for now.