broken-link-checker icon indicating copy to clipboard operation
broken-link-checker copied to clipboard

Fails with TypeError on recursive

Open dbogatov opened this issue 5 years ago • 7 comments

Describe the bug

Exits with TypeError: Cannot read property 'call' of null when run against (at least) my website using -r.

See full log
# ./bin/blc --filter-level 3 -r https://dbogatov.org

Starting recursive scan...

Getting links from: https://dbogatov.org/

Getting links from: https://dbogatov.org/
TypeError: Cannot read property 'isAllowed' of null

======================
Links found: 0
Links skipped: 0
Links OK: 0
Links broken: 0
Time elapsed: 1 second
======================

├───OK─── https://www.googletagmanager.com/gtag/js?id=UA-65293382-4
Finished! 1 links found. 0 broken.
TypeError: Cannot read property 'call' of null
    at HtmlUrlChecker._completedPage2 (/broken-link-checker/lib-cjs/public/HtmlUrlChecker.js:264:44)
    at HtmlChecker.<anonymous> (/broken-link-checker/lib-cjs/public/HtmlUrlChecker.js:142:407)
    at HtmlChecker.emit (events.js:315:20)
    at HtmlChecker.emit (/broken-link-checker/lib-cjs/internal/SafeEventEmitter.js:20:13)
    at HtmlChecker._complete2 (/broken-link-checker/lib-cjs/public/HtmlChecker.js:211:8)
    at UrlChecker.<anonymous> (/broken-link-checker/lib-cjs/public/HtmlChecker.js:99:395)
    at UrlChecker.emit (events.js:315:20)
    at UrlChecker.emit (/broken-link-checker/lib-cjs/internal/SafeEventEmitter.js:20:13)
    at RequestQueue.<anonymous> (/broken-link-checker/lib-cjs/public/UrlChecker.js:68:54)
    at RequestQueue.emit (events.js:315:20)
    at RequestQueue._removeItem2 (/broken-link-checker/node_modules/limited-request-queue/lib-es5/index.js:373:65)
    at /broken-link-checker/node_modules/limited-request-queue/lib-es5/index.js:303:63
    at RequestQueue.<anonymous> (/broken-link-checker/lib-cjs/public/UrlChecker.js:67:7)
    at processTicksAndRejections (internal/process/task_queues.js:97:5)

To Reproduce

Here is what I did:


$ docker run -it node:14.3.0-alpine3.10 /bin/sh
apk add --update bash git
git clone https://github.com/stevenvachon/broken-link-checker.git
cd broken-link-checker/
npm install
npm run build

# here is the call
./bin/blc --filter-level 3 -r https://dbogatov.org

Expected behavior

Earlier versions (e.g. v0.7.x) work fine.

For the record, the reason I tried to switch to v8 is because all of sudden earlier versions started to dislike perfectly fine SSL certificates...

Environment:

  • OS and version: node:14.3.0-alpine3.10 /bin/sh Docker image
  • Node.js version: 14.0.3
  • broken-link-checker version: built from master (a08abcdbec91197a7232d720989e6fb608517e46)

dbogatov avatar May 31 '20 19:05 dbogatov

@dbogatov one of workarounds is to simulate /robots.txt path

nginx example:

location = /robots.txt {
    return 200 "User-agent: *\nDisallow: /\n";
}

viatcheslavmogilevsky avatar Oct 19 '20 22:10 viatcheslavmogilevsky

@viatcheslavmogilevsky

What do you mean? Do you suggest that I modify my server config just to make BLC not fail?

dbogatov avatar Oct 20 '20 02:10 dbogatov

@dbogatov

yes, this is just workaround

it seems blc fails if there is no /robots.txt path

viatcheslavmogilevsky avatar Oct 20 '20 08:10 viatcheslavmogilevsky

@viatcheslavmogilevsky

This seems like an impractical workaround. I test dozens of websites running NGINX, Apache or plain .NET Core / NodeJS. For some of the servers I don't even have proper access to configs. On top of that, introducing a hack into server config codebase just to make a particular buggy CI tool succeed is, IMHO, a bad practice.

Thanks for the idea anyway! Good catch that the robots.txt is related to the issue!

dbogatov avatar Oct 20 '20 19:10 dbogatov

@viatcheslavmogilevsky

By the way, out of frustration that the bugs are not being fixed for years, I decided to code my own scaled-down alternative to BLC. Works well at least for my websites!

https://github.com/dbogatov/broken-links-inspector

dbogatov avatar Oct 20 '20 19:10 dbogatov

anyway there is another bug even with /robots.txt workaround -in v0.8.0-alpha recursive mode doesn't work for me: it checks all internal links, but it doesn't visit to them

in v0.7.8 it does visit to all internal links ( recursive mode ) - but it seems it doesn't work with sites whose support only http/2

viatcheslavmogilevsky avatar Oct 22 '20 11:10 viatcheslavmogilevsky

@dbogatov I was getting the same error but it was never fixed! I did the same, code my own alternative to BLC.

gauravgandhi1315 avatar Sep 29 '21 16:09 gauravgandhi1315