Broken-Link-Crawler
Broken-Link-Crawler copied to clipboard
:robot: Python bot that crawls your website looking for dead stuff
This was for my tutorial on building a dead link checker so its scope has been kept quite small.
Broken Link Crawler
Let's say I have a website and I want to find any dead links and images on this website.
$ python deadseeker.py 'https://healeycodes.com/'
> 404 - https://docs.python.org/3/library/missing.html
> 404 - https://github.com/microsoft/solitare2
The website is crawled, and all href
and src
attributes are sent a request. Errors are reported. This bot doesn't observe robots.txt
but you should.
It is not a clever bot. But it is a good bot.
Accepting (small) PRs and issues!