broken-link-checker
broken-link-checker copied to clipboard
Fails with TypeError on recursive
Describe the bug
Exits with TypeError: Cannot read property 'call' of null when run against (at least) my website using -r.
See full log
# ./bin/blc --filter-level 3 -r https://dbogatov.org
Starting recursive scan...
Getting links from: https://dbogatov.org/
Getting links from: https://dbogatov.org/
TypeError: Cannot read property 'isAllowed' of null
======================
Links found: 0
Links skipped: 0
Links OK: 0
Links broken: 0
Time elapsed: 1 second
======================
├───OK─── https://www.googletagmanager.com/gtag/js?id=UA-65293382-4
Finished! 1 links found. 0 broken.
TypeError: Cannot read property 'call' of null
at HtmlUrlChecker._completedPage2 (/broken-link-checker/lib-cjs/public/HtmlUrlChecker.js:264:44)
at HtmlChecker.<anonymous> (/broken-link-checker/lib-cjs/public/HtmlUrlChecker.js:142:407)
at HtmlChecker.emit (events.js:315:20)
at HtmlChecker.emit (/broken-link-checker/lib-cjs/internal/SafeEventEmitter.js:20:13)
at HtmlChecker._complete2 (/broken-link-checker/lib-cjs/public/HtmlChecker.js:211:8)
at UrlChecker.<anonymous> (/broken-link-checker/lib-cjs/public/HtmlChecker.js:99:395)
at UrlChecker.emit (events.js:315:20)
at UrlChecker.emit (/broken-link-checker/lib-cjs/internal/SafeEventEmitter.js:20:13)
at RequestQueue.<anonymous> (/broken-link-checker/lib-cjs/public/UrlChecker.js:68:54)
at RequestQueue.emit (events.js:315:20)
at RequestQueue._removeItem2 (/broken-link-checker/node_modules/limited-request-queue/lib-es5/index.js:373:65)
at /broken-link-checker/node_modules/limited-request-queue/lib-es5/index.js:303:63
at RequestQueue.<anonymous> (/broken-link-checker/lib-cjs/public/UrlChecker.js:67:7)
at processTicksAndRejections (internal/process/task_queues.js:97:5)
To Reproduce
Here is what I did:
$ docker run -it node:14.3.0-alpine3.10 /bin/sh
apk add --update bash git
git clone https://github.com/stevenvachon/broken-link-checker.git
cd broken-link-checker/
npm install
npm run build
# here is the call
./bin/blc --filter-level 3 -r https://dbogatov.org
Expected behavior
Earlier versions (e.g. v0.7.x) work fine.
For the record, the reason I tried to switch to v8 is because all of sudden earlier versions started to dislike perfectly fine SSL certificates...
Environment:
- OS and version:
node:14.3.0-alpine3.10 /bin/shDocker image - Node.js version: 14.0.3
- broken-link-checker version: built from master (a08abcdbec91197a7232d720989e6fb608517e46)
@dbogatov one of workarounds is to simulate /robots.txt path
nginx example:
location = /robots.txt {
return 200 "User-agent: *\nDisallow: /\n";
}
@viatcheslavmogilevsky
What do you mean? Do you suggest that I modify my server config just to make BLC not fail?
@dbogatov
yes, this is just workaround
it seems blc fails if there is no /robots.txt path
@viatcheslavmogilevsky
This seems like an impractical workaround. I test dozens of websites running NGINX, Apache or plain .NET Core / NodeJS. For some of the servers I don't even have proper access to configs. On top of that, introducing a hack into server config codebase just to make a particular buggy CI tool succeed is, IMHO, a bad practice.
Thanks for the idea anyway!
Good catch that the robots.txt is related to the issue!
@viatcheslavmogilevsky
By the way, out of frustration that the bugs are not being fixed for years, I decided to code my own scaled-down alternative to BLC. Works well at least for my websites!
https://github.com/dbogatov/broken-links-inspector
anyway there is another bug even with /robots.txt workaround -in v0.8.0-alpha recursive mode doesn't work for me: it checks all internal links, but it doesn't visit to them
in v0.7.8 it does visit to all internal links ( recursive mode ) - but it seems it doesn't work with sites whose support only http/2
@dbogatov I was getting the same error but it was never fixed! I did the same, code my own alternative to BLC.