Improve or add more information when a "Network error" error is reported
Currently, it's quite difficult to debug and troubleshoot why an error reported by Lychee is happening. At the moment for some links we only get something like:
Network error: error sending request for url (https://www.foobar.com/) Maybe a certificate error?
Even though, we have followed the documentation regarding this issue, and increased the log level, but still it is pretty difficult to know what went wrong and how to fix it.
Would be nice to just simply show any further information about why any error of this kind has occurred or the underlying error.
The relevant code where we extract additional information from the underlying error is here. As you can see, there's not much going on except us trying to trim the first part, which is "error trying to connect:". If that string doesn't exist in the reqwest error, we simply use the string that gets returned. So we don't strip any information. (Also see this quick test code).
In summary, that means that there's no additional information being returned from reqwest, so I guess that would be an issue on their end? Unless someone finds a bug in that code.
Thanks @mre for the quick reply. I see.
If not further information can be logged, then it is what it is...
All of a sudden we're getting quite a lot of these errors in our GitHub workflows, to our own websites... so everything looks fine using cURL command in our terminals.
As mentioned we updated the ca-certificates in the Ubuntu GitHub runner, used the flag --insecure and increased the verbosity but nothing found.
So I'm just considering to use the Lychee Docker image instead, to see if the problem is coming from the environment.
Is there any further recommendation about how to tackle these errors? They are completely random failures from run to run, but mostly all of them related to our own hosted links.
If it helps, you can share a set of failing links with me via email or paste them here. I'll take a look then.
That would be really awesome. Thanks a lot.
Here is a sample report we get:
As I said, the errors shown for each run change, so it's not always the same links, and locally it's impossible to reproduce.
Here's what I get from my local machine:
❯❯❯ lychee -vvv link-check-results1.md
[404] https://github.com/hivemq/hivemq4-documentation/actions/runs/15577839131?check_suite_focus=true | Rejected status code (this depends on your "accept" configuration): Not Found
[DEBUG] Redirecting to https://www.hivemq.com/products/mqtt-cloud-broker/
[200] https://docs.hivemq.com/hivemq/latest/user-guide/diagnostics-mode.html
[200] https://docs.hivemq.com/hivemq/latest/user-guide/diagnostics-mode.html
[DEBUG] Redirecting to https://www.hivemq.com/legal/privacy-policy
[200] https://www.hivemq.com/pricing/
[200] https://docs.hivemq.com/hivemq/latest/user-guide/restrictions.html
[200] https://www.hivemq.com/pricing/
[200] https://www.hivemq.com/blog/mqtt5-essentials-part5-client-feedback-negative-ack/
[200] https://www.hivemq.com/blog/mqtt-essentials-part2-publish-subscribe/
[200] https://docs.hivemq.com/hivemq/latest/user-guide/restrictions.html
[200] https://docs.hivemq.com/hivemq/latest/data-hub/quick-start.html
[200] https://www.hivemq.com/blog/mqtt5-essentials-part5-client-feedback-negative-ack/
[200] https://docs.hivemq.com/hivemq/latest/data-hub/quick-start.html
[200] https://www.hivemq.com/blog/mqtt-essentials-part2-publish-subscribe/
[200] https://www.hivemq.com/changelog/hivemq-platform-operator-1-1-0-release/
[200] https://docs.hivemq.com/hivemq/latest/data-hub/policies.html
[200] https://docs.hivemq.com/hivemq/latest/data-hub/policies.html
[DEBUG] Redirecting to https://www.hivemq.com/legal/privacy-policy
[DEBUG] Redirecting to https://www.hivemq.com/products/mqtt-cloud-broker/
[200] https://www.hivemq.com/mqtt-cloud-broker/
[DEBUG] Redirecting to https://www.hivemq.com/legal/privacy-policy/
[DEBUG] Redirecting to https://www.hivemq.com/legal/privacy-policy/
[200] https://www.hivemq.com/mqtt-cloud-broker/
[200] https://www.hivemq.com/privacy-policy/
[200] https://www.hivemq.com/privacy-policy/
[200] https://caniuse.com/#feat=websockets
[200] https://caniuse.com/#feat=websockets
[200] https://caniuse.com/?search=406.shtml
Issues found in 1 input. Find details below.
[link-check-results1.md]:
[404] https://github.com/hivemq/hivemq4-documentation/actions/runs/15577839131?check_suite_focus=true | Rejected status code (this depends on your "accept" configuration): Not Found
🔍 23 Total (in 1s) ✅ 22 OK 🚫 1 Error
[WARN ] There were issues with GitHub URLs. You could try setting a GitHub token and running lychee again.
79698 URLs is a lot. It might be that the server cannot handle the amount of incoming requests, i.e. the request queue is full on the server-side. I don't know haw many of those links are pointing to your hivemq infrastructure, but I assume most of them based on your earlier comment?
Can you try again with a more conservative concurrency limit? --max-concurrency 32 or so?
Wow... awesome! It did work!...
Status | Count
-- | --
🔍 Total | 79860
✅ Successful | 79834
⏳ Timeouts | 0
🔀 Redirected | 0
👻 Excluded | 26
❓ Unknown | 0
🚫 Errors | 0
But I have to say that it's been a while already since the first time we started to use Lychee and didn't have these kind of issues before with the same amount of links.
the request queue is full on the server-side
So it might be a problem now on our side? in the server hosting these HTML files (Netlify in our case)? What I still don't understand is why it may work fine if that's the case from a local run and not from a GitHub runner though.
Is there any particular recommendation setup for this kind of load? Also, would it not be possible to somehow show something in the logs in this regards? Otherwise, others may end up in a situation like this.
In any case, thanks a lot for your help. Really appreciated!
Less frequent, but still this error is being reported from time to time.
Wow... awesome! It did work!...
Progress! 🥳
Less frequent, but still this error is being reported from time to time.
Well... okay. 👉 👈 😅
But I have to say that it's been a while already since the first time we started to use Lychee and didn't have these kind of issues before with the same amount of links.
We haven't made any significant changes to our checking algorithm. But it could be a variety of factors such as the server config changing on Netlify, bot detection, rate-limiting for Github runners etc. Hard to track down.
So it might be a problem now on our side? in the server hosting these HTML files (Netlify in our case)? What I still don't understand is why it may work fine if that's the case from a local run and not from a GitHub runner though.
I'm not sure if Netlify provides public logs for that. This would help to troubleshoot the problem.
Is there any particular recommendation setup for this kind of load?
In general, I recommend to be conservative with the concurrency numbers. It's always a tradeoff between total runtime and avoiding server overload and that depends on the server infrastructure. The client itself just maxes out the network card. Perhaps --max-concurrency 16 --retry-wait-time 5 --max-retries 1 helps bring down the load.
Also, would it not be possible to somehow show something in the logs in this regards? Otherwise, others may end up in a situation like this.
Not sure what else we could log that we don't already. As I mentioned reqwest might just not provide any more details. Apart from that, I can't think of much else. Perhaps the number of retries before we gave up? But that's sort of implicit given the config. Open for suggestions here.
In any case, thanks a lot for your help. Really appreciated!
You're welcome.
Seems the issue is still happening (though very less often I have to admit)... even with the following setup:
max_retries = 1
max_concurrency = 10
retry_wait_time = 5
If there is a problem with the Netlify server certificate, I'd guess the --insecure flag should help here?
It seems, we have improved quite a bit the results by using the following setup, along with using the insecure flag:
max_retries = 1
max_concurrency = 3
retry_wait_time = 5
The only drawback about this, is that it takes a little more now. But looks promising so far
Can you try https://github.com/lycheeverse/lychee/pull/1731? It is an attempt to get more information from the error.
You can do so by installing the binary with cargo.
Running that reqwest-err branch in GitHub, we can now see a little more information about the error (and without the settings I mentioned above).
See:
Error: https://docs.hivemq.com/hivemq/latest/user-guide/restrictions.html#throttle-connections | Network error: Connection failed - check network connectivity and firewall settings
Error: https://www.hivemq.com/privacy-policy/ | Network error: Connection failed - check network connectivity and firewall settings
Error: https://www.hivemq.com/privacy-policy/ | Network error: Connection failed - check network connectivity and firewall settings
Error: https://www.hivemq.com/downloads/ | Network error: Connection closed before response completed
Error: https://www.hivemq.com/downloads/ | Network error: Connection closed before response completed
Error: https://www.hivemq.com/privacy-policy/ | Network error: Connection closed before response completed
Error: https://www.hivemq.com/blog/introducing-flexible-mqtt-platform-upgrades-hivemq/ | Network error: Connection closed before response completed
Error: https://www.hivemq.com/blog/introducing-flexible-mqtt-platform-upgrades-hivemq/ | Network error: Connection failed - check network connectivity and firewall settings
[TIMEOUT] https://docs.hivemq.com/hivemq/latest/data-hub/behavior-models.html | Timeout
[TIMEOUT] https://docs.hivemq.com/hivemq/latest/control-center/trace-recordings.html | Timeout
[TIMEOUT] https://docs.hivemq.com/hivemq/latest/upgrade/4-12-to-4-13.html | Timeout
[TIMEOUT] https://docs.hivemq.com/hivemq/latest/upgrade/4-[19](https://github.com/hivemq/hivemq4-documentation/actions/runs/15700159994/job/44232948572?pr=2111#step:6:20)-to-4-20.html | Timeout
[TIMEOUT] https://docs.hivemq.com/hivemq/latest/upgrade/4-25-to-4-26.html | Timeout
[TIMEOUT] https://www.hivemq.com/changelog/whats-new-in-hivemq-4-31/ | Timeout
[TIMEOUT] https://www.hivemq.com/blog/mqtt5-essentials-part4-session-and-message-expiry/ | Timeout
[TIMEOUT] https://www.hivemq.com/blog/hivemq-cluster-docker-kubernetes/ | Timeout
[TIMEOUT] https://docs.hivemq.com/hivemq/latest/user-guide/docker.html | Timeout
[TIMEOUT] https://www.hivemq.com/blog/mqtt5-essentials-part8-payload-format-description/ | Timeout
[TIMEOUT] https://www.hivemq.com/mqtt-essentials-part-4-mqtt-publish-subscribe-unsubscribe/ | Timeout
[TIMEOUT] https://www.hivemq.com/changelog/hivemq-platform-operator-1-7-0-release/ | Timeout
[TIMEOUT] https://www.hivemq.com/changelog/whats-new-in-hivemq-4-[23](https://github.com/hivemq/hivemq4-documentation/actions/runs/15700159994/job/44232948572?pr=2111#step:6:24)/ | Timeout
[TIMEOUT] https://www.hivemq.com/changelog/whats-new-in-hivemq-4-29/ | Timeout
Error: Process completed with exit code 2.
Hey @afalhambra-hivemq,
Looking at your latest error output, I think I can see what's actually going on here. Your hosting setup (Netlify) is actively identifying and blocking the GitHub Actions runner as a potential bot/scraper.
The error progression is pretty telling:
- First: "Connection failed" (initial blocking)
- Then: "Connection closed before response completed" (mid-request kills)
- Finally: Complete timeouts (IP likely blacklisted)
This is classic bot detection behavior. The system lets connections start, watches the request patterns, then kills them when it detects automated traffic. GitHub Actions runners come from AWS IPs that are often flagged, plus 79k requests probably looks exactly like a scraping attack to their systems.
Your low concurrency approach helps because it makes the traffic look more human-like, but you're still fighting against bot detection that's specifically looking for patterns like lychee's user-agent + high volume + CI IP ranges.
One thing you might consider - since this is your own infrastructure, you could add lychee as an allowed user-agent in your Netlify config if possible. Or even better, create a custom user-agent string that only you know about (like "lychee-hivemq-internal") so your bot detection recognizes it as legitimate internal tooling rather than external scraping. I saw some info here, but this is mostly about blocking bots instead of allowing them.
I think you'd have to wait for #1605 to fix this. The result would be pretty similar to your manual throttling, though.
As for the changes in my PR (#1731), it looks like they were helpful in narrowing down the root-cause, right? In this case I'd try to get it in shape for merging.
Really impressive and useful analysis! And makes perfect sense to me.
One thing you might consider - since this is your own infrastructure, you could add lychee as an allowed user-agent in your Netlify config if possible. Or even better, create a custom user-agent string that only you know about (like "lychee-hivemq-internal") so your bot detection recognizes it as legitimate internal tooling rather than external scraping. I saw some info here, but this is mostly about blocking bots instead of allowing them.
Yeah, makes sense to me to create a custom user-agent string and configure in our Netlify as a trusted one, instead of considering it as a bot/scraper.
I think you'd have to wait for https://github.com/lycheeverse/lychee/issues/1605 to fix this. The result would be pretty similar to your manual throttling, though.
I guess, it makes more sense to create this custom user-agent for Netlify, though I guess this would also help.
As for the changes in my PR (https://github.com/lycheeverse/lychee/pull/1731), it looks like they were helpful in narrowing down the root-cause, right? In this case I'd try to get it in shape for merging.
Totally, this will be pretty useful to identify these kind of behaviours as you spotted.
Thanks again for your help and analysis!
https://github.com/lycheeverse/lychee/pull/1731 is merged. Also, working on per-host rate limiting here, which should also alleviate the problem. Looks like we can close this. 🥳