broken-link-checker icon indicating copy to clipboard operation
broken-link-checker copied to clipboard

HTML parser fails for some Sites

Open vvdwivedi opened this issue 4 years ago • 3 comments

Describe the bug I am testing with two sites for a recursive scan:

  1. https://pg.vvdwivedi.com/
  2. https://www.capillarytech.com/

First one runs fine. I am using the cli from bin folder to run the build code. After digging through the code, I saw that parseHTML(https://github.com/stevenvachon/broken-link-checker/blob/master/lib/internal/parseHTML.js) is having issues, specifically with:

parser.once(FINISH_EVENT, () => { resolve(parser.document) });

There is a TODO with an issue link (https://github.com/sindresorhus/got/issues/834) at that part of code. I can see that the issue is marked as closed, but even after trying out the suggested solution, my scan is not working.

To Reproduce Build the code. From root directory, execute

./bin/blc https://www.capillarytech.com/ -r

./bin/blc https://pg.vvdwivedi.com -r

Expected behavior Both should run a full recursive scan.

Environment:

  • OS and version: macOS Catalina (10.15.2)
  • Node.js version: 12.13.1
  • broken-link-checker version: v-8-alpha (checking after building from master branch)

vvdwivedi avatar Mar 31 '20 17:03 vvdwivedi

Yeah, the "finish" event isn't emitting for some reason.

stevenvachon avatar Feb 24 '21 00:02 stevenvachon

@vvdwivedi Im facing the same issue.

gauravgandhi1315 avatar Feb 07 '22 19:02 gauravgandhi1315

Any updates here?

aarongustafson avatar Mar 31 '23 16:03 aarongustafson