broken-link-checker
broken-link-checker copied to clipboard
Feature Request: Less Verbose Options (Broken-Only, 404-Only, etc)
I find BLC to be extremely useful, but the output has too much information (I'm betting the majority of users are looking for the BROKEN links, not the OK links). It would be great to have a CLI option for outputting only broken links, or only certain types of errors (404, 403, etc), and any page with 0 broken links would have nothing output at all.
For example I get output like this now:
Getting links from: https://www.example.com/archives/
├───OK─── https://docs.example.com/install/
├─BROKEN─ https://docs.example.com/archives.html (HTTP_404)
├───OK─── https://example.com/doc
├───OK─── https://example.com/docs
├───OK─── https://example.com/docs/archives
├───OK─── https://example.com/content/archives.html
├───OK─── https://example.com/example-docs/
└───OK─── https://example.com/master/doc
Finished! 88 links found. 80 excluded. 1 broken.
Getting links from: https://docs.example.com/ssh/
├───OK─── https://the.earth.li/%7Esgtatham/putty/0.67/htmldoc/Chapter8.html#pubkey-puttygen
├───OK─── https://wiki.eclipse.org/EGit/User_Guide#Eclipse_SSH_Configuration
├───OK─── https://www.digitalocean.com/community/tutorials/understanding-the-ssh-encryption-and-connection-process
└───OK─── http://www.chiark.greenend.org.uk/%7Esgtatham/putty/download.html
Finished! 120 links found. 115 excluded. 0 broken.
A --less-verbose flag would output only this (the second link scanned would output nothing since there were no broken links):
Broken link(s) from: https://www.example.com/archives/
└─BROKEN─ https://docs.example.com/archives.html (HTTP_404)
An interim solution is to use a pipe to grep
.
blc -r https://www.example.com/archives/ | grep --color=never -e 'Getting links' -e '404' -e 'Finished!'
I threw this together - it adds a -q
/--quiet
flag to only show broken pages & links: https://github.com/alexlouden/broken-link-checker
An interim solution is to use a pipe to
grep
.blc -r https://www.example.com/archives/ | grep --color=never -e 'Getting links' -e '404' -e 'Finished!'
Thanks for this but it's not really that helpful as it shows every page, even if that page has nothing broken so if you've got a broken page in a 1000s pages you have go through a 1000s lines trying to find the one that has the broken link.
Would you except a quiet option patch that only output names if something is broken?
Would you except a quiet option patch that only output names if something is broken?
Hey @greggman - I've implemented this in my fork, if you'd like to have a look? https://github.com/alexlouden/broken-link-checker
We're using my version at work in our CI and it makes it a lot clearer to see what's broken
@alexlouden that's great. Have you submitted a PR?
Just submitted one @greggman - thanks for the push 😃
@greggman
If you dump tasmo's suggestion above into a text file you can run the following against it to remove the redundant "Getting links from" noise.
sed '/Getting links from/{$!N;/\n.*Getting links from/!P;D}' file
This command will remove a line containing "Getting links from" if it is immediately followed by a line "Getting links from".
@greggman
If you dump tasmo's suggestion above into a text file you can run the following against it to remove the redundant "Getting links from" noise.
sed '/Getting links from/{$!N;/\n.*Getting links from/!P;D}' file
This command will remove a line containing "Getting links from" if it is immediately followed by a line "Getting links from".
Can you update this command to match the new syntax, which includes:
Finished! # links found. # excluded. # broken.
Hey @greggman - I've implemented this in my fork, if you'd like to have a look? https://github.com/alexlouden/broken-link-checker
@alexlouden Thanks for your fork! I works well to lower the noise level. I installed it globally with:
npm install git+https://github.com/alexlouden/broken-link-checker -g