broken-link-checker icon indicating copy to clipboard operation
broken-link-checker copied to clipboard

HTTP_403 while curl gives HTTP_200

Open dbogatov opened this issue 8 years ago • 11 comments

I have encountered a link which is considered broken by blc but opens well in curl or browser.

Here it is:

blc https://www.nginx.com

CURL works fine:

$ curl -I https://www.nginx.com
HTTP/1.1 200 OK
Date: Mon, 02 Jan 2017 06:52:15 GMT
Content-Type: text/html; charset=UTF-8
Connection: keep-alive
X-Pingback: https://www.nginx.com/xmlrpc.php
Link: <https://www.nginx.com/wp-json/>; rel="https://api.w.org/"
Link: <https://www.nginx.com/>; rel=shortlink
Link: <https://www.nginx.com/wp-json>; rel="https://github.com/WP-API/WP-API"
X-User-Agent: standard
X-Cache-Config: 0 0
Vary: Accept-Encoding, User-Agent
X-Cache-Status: MISS
Server: nginx
X-XSS-Protection: 1; mode=block
X-Frame-Options: SAMEORIGIN
X-Content-Type-Options: nosniff
X-Sucuri-ID: 14010

but BLC does not:

$ blc https://www.nginx.com
Getting links from: https://www.nginx.com/
Error: HTML could not be retrieved

User agent does not help:

$ blc --input https://www.nginx.com --user-agent "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/602.4.3 (KHTML, like Gecko) Version/10.0.3 Safari/602.4.3"
Getting links from: https://www.nginx.com/
Error: HTML could not be retrieved

What is the problem? Is it a particularly NGINX bug, or larger set of websites is affected?

dbogatov avatar Jan 02 '17 06:01 dbogatov

It could be caused by bhttp which does not have an https agent, and only sort of works with https.

stevenvachon avatar Jan 03 '17 03:01 stevenvachon

Thanks!

So the issue will persist until bhttp releases a fix, right?

dbogatov avatar Jan 03 '17 03:01 dbogatov

Yeah. I've spoken with that project's creator, and he's been working on the next major version. Not sure when it will be released, though.

stevenvachon avatar Jan 03 '17 03:01 stevenvachon

Great, thank you!

I would leave the issue open if you don't mind. I'll close as soon as they release a fix.

dbogatov avatar Jan 03 '17 03:01 dbogatov

No problem. I think it makes sense to keep it open, as it is an issue that ~~needs fixing~~ he plans to fix along with breaking changes in a major release.

stevenvachon avatar Jan 03 '17 03:01 stevenvachon

Hi!

It's been more than 3 months. Have bhttp released the fix?

dbogatov avatar Apr 11 '17 00:04 dbogatov

Nope 👎 I'll have to switch to something else in 0.8.x and so far I've been looking at axios.

stevenvachon avatar Apr 11 '17 13:04 stevenvachon

@stevenvachon can I exclude links with 403 error from blc report?

vkotovv avatar May 12 '17 14:05 vkotovv

@vkotovv not currently in the CLI. You can create a custom report with the programmatic API, though.

stevenvachon avatar May 12 '17 15:05 stevenvachon

Is there a workaround for this? I am currently blocked. node-fetch works, bhttp simply does not.

Glavin001 avatar Jul 02 '19 17:07 Glavin001

Is this fixed with v0.8.0 branch?

stevenvachon avatar Jul 16 '19 23:07 stevenvachon