colly icon indicating copy to clipboard operation
colly copied to clipboard

TLS Error on Robots.txt is not handled in OnError

Open sundarv85 opened this issue 2 years ago • 1 comments

I'm running a test project on localhost:8000 and when I access it over https, it fails (which is expected)

Get "https://localhost:8000/": tls: first record does not look like a TLS handshake

The above is correctly caught in OnError. However, when I set ignoreRobots to false, then it tries to fetch the robots.txt and the below failure

Get "https://localhost:8000/robots.txt": tls: first record does not look like a TLS handshake

Is not propogated to OnError - as it is really not originating from the request that I had started, but colly tries to first fetch the robots which fails.. Could this also be propogated either to OnError or can be caught with a known Error Code from Colly such as

ErrRobotsTxtBlocked = errors.New("URL blocked by robots.txt")
ErrRobotsTxtFetchFailed = errors.New("Unable to fetch robots.txt") // New Error Code

sundarv85 avatar Nov 29 '22 07:11 sundarv85

This proposal makes sense.

WGH- avatar Jan 05 '23 21:01 WGH-