hub-feedback icon indicating copy to clipboard operation
hub-feedback copied to clipboard

Automation blocked from using the docker hub search API

Open visit1985 opened this issue 3 years ago • 6 comments

Problem description

We use an automation based on docker-hub-rss, to monitor our base images for updates and create issues in our ticket system to verify the changes.

Since Jun 29th, our CI system gets blocked from using the docker hub search API for some limited amount of time (<1 day) after calling it via docker-hub-rss.

As this is an essential part of our software lifecycle management, can you please unblock this type of traffic again, tell us what conditions we need to meet in order to not get blocked, or help us to find an alternative notification method?

Debug Information

  1. Use any browser to access https://hub.docker.com/_/alpine
  2. Run docker run -it --name "docker-hub-rss" -p 127.0.0.1:3001:3000 --rm theconnman/docker-hub-rss:latest
  3. Tries to access http://localhost:3001/_/alpine.atom?includeRegex=%5E%5B0-9%5D%2B%5C.%5B0-9%5D%2B%5C.%5B0-9%5D%2B%24 end in ERR_EMPTY_RESPONSE
  4. Tries to access https://hub.docker.com/_/alpine again, show a 404 page

visit1985 avatar Jun 30 '22 08:06 visit1985

@visit1985 we are not aware of any blocking done by Docker Hub.

We've noticed the docker-hub-rss uses an undocumented Hub API endpoint to read tags.

More so, it would appear the way the tool handles the pagination doesn't check the returned responses code which, when there are no more results to be returned, returns 404.

I'd recommend reaching out to the tool authors and ask them to update this piece code https://github.com/TheConnMan/docker-hub-rss/blob/fb40dabe9e7e82f4d65f8799c76c2462acc8d8f1/api/%5Busername%5D/%5Brepository%5D.js#L44-L46 so that it handles the case where at some point when going through paginated results it encounters 404 it should stop.

We're also not aware of any issues related to visiting https://hub.docker.com/_/alpine.

milosgajdos avatar Jun 30 '22 11:06 milosgajdos

I would also note that I don't see any sort of caching in there (though I'm no Javascript programmer). If this fires requests to Hub any time someone tries to load the RSS feed, and that happens often, you could easily find yourself hitting the abuse rate limits.

That said, I think the evaluation window for those limits is something like ten minutes. If your IP is locked out for long periods of time, that means it's continuing to make a bunch of queries.

I would recommend looking at something like Diun for better notifications with cron scheduling: https://github.com/crazy-max/diun

binman-docker avatar Jun 30 '22 14:06 binman-docker

I would recommend looking at something like Diun for better notifications with cron scheduling: https://github.com/crazy-max/diun

Thanks for the hint.

There is caching in package docker-hub-api which is use by docker-hub-rss.

After digging a bit deeper, I see a HTTP 429 {"detail": "Rate limit exceeded", "error": false} returned from the API. Maybe the amount of tags for one of the repos is to big to query all pages without a delay. I will continue debugging tomorrow and try to fix the pagination on HTTP 4xx.

@milosgajdos Can you tell me what is the rate limit for this API?

visit1985 avatar Jun 30 '22 16:06 visit1985

There is caching in package docker-hub-api which is use by docker-hub-rss.

Ah, perfect. 5 minutes should be fine in this application, though could probably be increased depending on your needs.

After digging a bit deeper, I see a HTTP 429 {"detail": "Rate limit exceeded", "error": false} returned from the API

Thanks for the error detail! Digging through the code, it looks like the limits for that API are currently set at 600 requests per minute for authenticated requests, and 180 for unauthenticated. On non-limited requests, you should be able to see X-RateLimit-* headers returned with exact details.

binman-docker avatar Jul 01 '22 01:07 binman-docker

Is it possible that the JSON response for errors changed some days ago? Because the way docker-hub-api detects errors seems not to work if I query a non-existent page: {"errinfo":{"namespace":"library","repository":"alpine"},"message":"object not found"}, while it does for a rate limit exception: {"detail": "Rate limit exceeded", "error": false}.

visit1985 avatar Jul 01 '22 10:07 visit1985

The error message has changed indeed!

milosgajdos avatar Jul 01 '22 16:07 milosgajdos