napari.github.io
napari.github.io copied to clipboard
Automatically check for broken links
We need an automated link checker to check for 404 broken links in the documentaion.
Ideally it would be run during the CI checks.
cc @DragaDoncila (I think you had some good suggestions for tools you've seen elsewhere that could do this)
Completely agree we need one! I haven't used any in a while (and helpfully also don't remember what I have used in the past), but I can look for some options. Also pinging @codemonkey800 who might know some off the top of his head, or point out any concerns/foreseen difficulties with using one for napari.org.
No concerns, it would be a good addition to the CI checks. Initial Google searches found tools like LinkChecker for Python and broken-link-checker for JavaScript.
Looking into this a bit more today, and I think this github action for lychee link checking seems like the simplest option: https://github.com/marketplace/actions/lychee-broken-link-checker
I especially like that you can run it as a cron job and it will automatically create issues for you if it finds any broken links. I say we should run it, idk, once a week?
Reopening this issue, because we are having ongoing issues.
Those problems are mostly related to not being able to ignore specific links well, we have way too many hits being reported in the automated "Link Checker Report" issues.
Attempted fixes (that haven't completely resolved the problem):
- https://github.com/napari/napari.github.io/pull/294
- https://github.com/napari/napari.github.io/pull/297
What we need next to move on
This his a call for volunteers
Somebody needs to dig into this problem more. I've just started a new job, so I don't have time available currently to do that.
Possible approaches
- I've been using this demo repo to test lychee, so we don't always have to tinker with the main napari.github.io repository. Anyone is welcome to duplicate this repo and try stuff with it: https://github.com/GenevieveBuckley/broken-link-checker
- Troubleshooting suggestion from Draga: echoing the current working directory from inside the github workflow, just to check we are where we expect & .lycheeignore is in the right spot
- Troubleshooting suggestions from Justin:
- modifying the .lycheeignore file to ignore ALL the links (so we can see if it’s finding the ignore file at all)
- Perhaps there is some subtle syntax error with the ignore file? Modify the ignorefile to exclude only one specific link and then check to see if that happened correctly.