broken-link-checker
broken-link-checker copied to clipboard
Latest BLC does not finish properly
Sometimes (and I would say most of the time), latest BLC (v0.7.6) silently fails in the middle of the work. As a consequence, does not report the result (and exit code is of no use).
Last known version that does not have that bug is v0.7.3.
See output
$ docker run -it node:8.9.1-alpine /bin/sh
/ # npm install -g broken-link-checker
npm WARN deprecated [email protected]: try optionator
/usr/local/bin/blc -> /usr/local/lib/node_modules/broken-link-checker/bin/blc
/usr/local/bin/broken-link-checker -> /usr/local/lib/node_modules/broken-link-checker/bin/blc
+ [email protected]
added 100 packages in 3.561s
/ # blc https://google.com
Getting links from: https://google.com/
├───OK─── https://www.google.com/imghp?hl=en&tab=wi
├───OK─── https://maps.google.com/maps?hl=en&tab=wl
├───OK─── https://play.google.com/?hl=en&tab=w8
├───OK─── https://news.google.com/nwshp?hl=en&tab=wn
├───OK─── https://mail.google.com/mail/?tab=wm
├───OK─── https://www.youtube.com/?gl=US&tab=w1
├───OK─── https://drive.google.com/?tab=wo
├───OK─── https://www.google.com/intl/en/options/
├───OK─── http://www.google.com/history/optout?hl=en
├───OK─── https://www.google.com/preferences?hl=en
├───OK─── https://accounts.google.com/ServiceLogin?hl=en&passive=true&continue=https://www.google.com/
├───OK─── https://www.google.com/search?site=&ie=UTF-8&q=Chinua+Achebe&oi=ddle&ct=chinua-achebes-87th-birthday-5104396332433408&hl=en&sa=X&ved=0ahUKEwjC4dzIo8TXAhUB4iYKHYtvB7UQPQgD
├───OK─── https://www.google.com/logos/doodles/2017/chinua-achebes-87th-birthday-5104396332433408.3-l.png
├───OK─── https://www.google.com/advanced_search?hl=en&authuser=0
├───OK─── https://www.google.com/language_tools?hl=en&authuser=0
├───OK─── https://www.google.com/intl/en/ads/
├───OK─── https://www.google.com/services/
├───OK─── https://plus.google.com/116899029375914044550
├───OK─── https://www.google.com/intl/en/about.html
├───OK─── https://www.google.com/intl/en/policies/privacy/
/ #
I've just had the same happen to me. It took a lot longer on my site crawl (has logged about 15mb of output to file) but is sitting there spinning but not going anywhere.
Did anyone find a workaround to this?
Node version? Also, try the v0.8.0 branch
Node version v6.11.4 BLC 0.7.7
Not sure if relevant, but we were seeing a hang in our own broken link checker that uses this library. My colleague implemented a small workaround in our code that seems to be helping: https://github.com/code-dot-org/code-dot-org/pull/21310
Update: we've continued getting zombie processes that didn't exit, after all.
I've been getting something similar. BLC will just spin. For what its worth, I traced it down to trying to look up this address: https://www.sothebys.com/en/ I didn't notice anything crazy on that page or the headers, so no clue past that.
For what it's worth, I've struck this problem too, where it seems BLC simply hangs near the end of processing the links.
It's been working fine for a long time for me (version 0.7.6), but suddenly starting hanging and never completing -- I suspect there's a particular link somewhere that's not getting processed correctly (an unresolved promise?), though I notice when I process different quantities of links (e.g. 10, 100, 400), it processes pretty much all of them before hanging.
In order to work-around this, I've used a setTimeout in the link API callback, such that if the setTimeout is not cleared within 30 seconds, it calls my finish routine that would normally be called by the end API callback:
let timeout
let linkCount
...
let htmlUrlChecker = new blc.HtmlUrlChecker({
excludeInternalLinks: true,
cacheResponses: false,
excludeLinksToSamePage: true,
}, {
link: function (result) {
linkCount++
log(`Processed ${linkCount}: ${result.url.original}`)
clearTimeout(timeout)
timeout = setTimeout(() => {
// broken-link-checker may not finish -- refer:
// * https://github.com/stevenvachon/broken-link-checker/issues/90
// It does however seem to always get stuck almost at the end.
// After waiting 30 seconds for the next link to be processed,
// we'll exit.
finish()
}, 30000) // 30 seconds
},
end: function () {
finish()
},
})
I'm also having the same issue. @jcdarwin suggestion is exactly what I was thinking of doing, so I'm glad to know that I'm not the only one dealing with that issue.
However, it would be good to find the culprit for the process to be hanging close to the end. Right now we're unable to check roughly 40 links in a database of more than 1000 links.
Still seeing this today…
In order to work-around this, I've used a
setTimeoutin thelinkAPI callback, such that if thesetTimeoutis not cleared within 30 seconds, it calls myfinishroutine that would normally be called by theendAPI callback:
@jcdarwin What does your finish() routine look like? I tried to figure out how to un-stall it by looking at the project source, but didn’t see a clear way to access done() on the item.