broken-link-checker icon indicating copy to clipboard operation
broken-link-checker copied to clipboard

Add timeout

Open joshribakoff opened this issue 8 years ago • 2 comments

cd repos/node_modules/broken-link-checker/
[josh@localhost broken-link-checker]$ grep -r 'timeout' .

No timeout option exists.

https://github.com/stevenvachon/broken-link-checker/issues/67

The timeout is defined by the operating system and is usually 2000ms. Setting Node's http timeout to anything longer than that will not override the OS setting. Setting to anything shorter might be insufficient.

I've implemented timeouts in nodeJS just fine. If there is some kind an OS issue, you could use setTimeout() & clearTimeout() to workaround this.

What happens is BLC tries to do an HTTP request to some random IP not running a web server. The whole scraper/queue blocks until the OS times out the TCP socket like you said (which is because I enqueue only 1 page at a time, then wait until complete to enqueue the next page). Basically I'm running multiple instances of BLC on different servers & reading URLs of my own queue system, as opposed to using the internal queue which would only be local to 1 server.

In your script you can implement a user-land timeout defaulted to 20 seconds, which can be customized. Simply use setTimeout to register a callback that removes your listener & resumes the queue. If the request completes before this callback fires, use clearTimeout. Behind the scenes the OS may still be trying to open a TCP connection, but there's no reason the BLC queue has to block. This is a huge performance problem. Instead of testing 100s of URLs a second I'm only getting a throughput of about 1 URL a second after averaging in all the blocking, and I have to check 100s of thousands of URLs.

joshribakoff avatar Aug 16 '17 04:08 joshribakoff

If a request is not timing out, then this is an issue with bhttp.

stevenvachon avatar Aug 16 '17 04:08 stevenvachon

It does time out. Eventually [after about 30-60 seconds]. I want to lower the timeout, maybe 5-10 seconds... but no API is provided in this library to do so.

joshribakoff avatar Aug 16 '17 04:08 joshribakoff