link-checker icon indicating copy to clipboard operation
link-checker copied to clipboard

Workaround for Too Many Requests (HTTP error 429)

Open mitm001 opened this issue 4 years ago • 6 comments

I use Antora for building my doc site. When it builds pages, it adds the same table of contents and header to every page. Every link has the same class of nav-link. As my site is over 250+ pages, this means that there are thousands of these duplicated links.

Sites will start timing things out after so many hits so I get thousands of these "Too Many Requests (HTTP error 429)" with default of 512 concurrent HTTP requests. I reduced this down to 32 to slow things down and this reduces the errors down to the hundreds.

I skip the links that are never going to change in the header using a regex but the ones in the TOC are always changing.

Are there any other configurations I could take advantage of to reduce these errors from the TOC? Like maybe skipping based off a class in the href?

mitm001 avatar Jul 17 '20 18:07 mitm001

Hi @mitm001 I'm afraid I don't know of any configurations that could help you. As you know, this action is a simple wrapper around Liche. It would probably be best to raise this issue there instead. Perhaps it's related to this issue https://github.com/raviqqe/liche/issues/37.

peter-evans avatar Jul 18 '20 01:07 peter-evans

Yep, thats what I will do.

mitm001 avatar Jul 18 '20 20:07 mitm001

Hi,

We're using your Github Action for our documentation as well (thanks!) and have started seeing this problem with github.io links - looking at the Liche repo I noticed a CLI option:

-c, --concurrency <num-requests>  Set max number of concurrent HTTP requests. [default: 512]

If that was configurable through the GA yaml it could probably help with the TMR error. Sure, if you have thousands of links to check you'll end up with a long run, but that's kinda what rate limiting is looking to do...

I'm not sure if that issue number 37 applies to us because rate limiting on Github's side sounds deterministic, while our errors are not.

ionut-arm avatar Oct 07 '20 12:10 ionut-arm

Hi @ionut-arm

Good point. Liche arguments are configurable via the args input, so I think the following example should work. I don't know what a suitable number of concurrent requests to try and avoid this issue are, though. That would just require some experimentation.

    - name: Link Checker
      uses: peter-evans/link-checker@v1
      with:
        args: -v -r -c 48 *

peter-evans avatar Oct 08 '20 01:10 peter-evans

Even with concurrency 1 it fails, as it seems to be not (only) about the amount of concurrent connections but about the number of connections in a specific time range: https://github.com/raviqqe/liche/issues/42 Probably due to keep alive requests. Basically it would require to add a delay between checking the same host another time 🤔.

MichaIng avatar Oct 15 '20 15:10 MichaIng

Liche was recently deprecated and as a result I've also decided to deprecate this action in favour of lychee-action, which is a fork of this project based on lychee. Please consider using that action.

According to the readme:

For GitHub links, it can optionally use a GITHUB_TOKEN to avoid getting blocked by the rate limiter.

peter-evans avatar Jan 04 '21 02:01 peter-evans