broken-link-checker
                                
                                 broken-link-checker copied to clipboard
                                
                                    broken-link-checker copied to clipboard
                            
                            
                            
                        HTTP_403 while curl gives HTTP_200
I have encountered a link which is considered broken by blc but opens well in curl or browser.
Here it is:
blc https://www.nginx.com
CURL works fine:
$ curl -I https://www.nginx.com
HTTP/1.1 200 OK
Date: Mon, 02 Jan 2017 06:52:15 GMT
Content-Type: text/html; charset=UTF-8
Connection: keep-alive
X-Pingback: https://www.nginx.com/xmlrpc.php
Link: <https://www.nginx.com/wp-json/>; rel="https://api.w.org/"
Link: <https://www.nginx.com/>; rel=shortlink
Link: <https://www.nginx.com/wp-json>; rel="https://github.com/WP-API/WP-API"
X-User-Agent: standard
X-Cache-Config: 0 0
Vary: Accept-Encoding, User-Agent
X-Cache-Status: MISS
Server: nginx
X-XSS-Protection: 1; mode=block
X-Frame-Options: SAMEORIGIN
X-Content-Type-Options: nosniff
X-Sucuri-ID: 14010
but BLC does not:
$ blc https://www.nginx.com
Getting links from: https://www.nginx.com/
Error: HTML could not be retrieved
User agent does not help:
$ blc --input https://www.nginx.com --user-agent "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/602.4.3 (KHTML, like Gecko) Version/10.0.3 Safari/602.4.3"
Getting links from: https://www.nginx.com/
Error: HTML could not be retrieved
What is the problem? Is it a particularly NGINX bug, or larger set of websites is affected?
It could be caused by bhttp which does not have an https agent, and only sort of works with https.
Thanks!
So the issue will persist until bhttp releases a fix, right?
Yeah. I've spoken with that project's creator, and he's been working on the next major version. Not sure when it will be released, though.
Great, thank you!
I would leave the issue open if you don't mind. I'll close as soon as they release a fix.
No problem. I think it makes sense to keep it open, as it is an issue that ~~needs fixing~~ he plans to fix along with breaking changes in a major release.
Hi!
It's been more than 3 months. Have bhttp released the fix?
Nope 👎 I'll have to switch to something else in 0.8.x and so far I've been looking at axios.
@stevenvachon can I exclude links with 403 error from blc report?
@vkotovv not currently in the CLI. You can create a custom report with the programmatic API, though.
Is there a workaround for this? I am currently blocked. node-fetch works, bhttp simply does not.
Is this fixed with v0.8.0 branch?