imagej.github.io
imagej.github.io copied to clipboard
Ensure there are no broken links, and automate the check
The script _bin/broken-links.sh
prints out links that it detects as broken. But it needs updating to handle additional cases:
-
/ij/*
– proxied mirror.imagej.net content – needs update to serve from/ij
and redirect old links from the root where feasible (e.g./macros
) - Other repos in this org:
/presentations
,/workshops
,/tutorials
,/list-of-update-sites
, others?
List of known observed weirdness so far:
-
BigDataServer: INFO__
are linkish, but are surrounded in backticks. These should be fixed, and other instances of backtick-mangling should be checked for. - Some MediaWiki-style links got escaped with backslashes—
\[link title\]
—I fixed many of them but would be good to double check there aren't any remaining. -
(Category_Segmentation)
(and similar) links—and_bin/broken-links.sh
does not find them.
Once we have a robust dead link checker, we also need to hook it up to an action to check when links break.
See also #55, #63 (IJ1 page renames), #66
Things that may be broken:
- Links to
/media/[subfolder]/
.. currently everything should just go to/media/
- Double encoded ampersands (
&
)
I looked at the _bin/broken_links.sh
script but don't really understand how to modify it to add these things..
I've been using htmlproofer which seems great except I can't get it to understand relative paths from the site root. For example
$ htmlproofer update-sites/index.html --disable_external --assume_extension --allow-hash-href --url-ignore "///list-of-update-sites/"
produces hundreds of failures of the form:
<a href="/update-sites/tos">ToS for personal update sites</a>
* internally linking to /update-sites/tos, which does not exist (line 90)
<a href="/update-sites/tos">ToS for personal update sites</a>
* internally linking to /update-sites/tos, which does not exist (line 234)
<a href="/update-sites/tos">ToS for personal update sites</a>
* internally linking to /update-sites/tos, which does not exist (line 234)
but running the same command on the root index.html
works fine even though the links are the same. But "true" relative paths, e.g. ../update-sites/tos
would work.
This led me down a horrible path:
- Jekyll 4.2 does something different in generating relative paths. When I build with 4.2 and go to the my local
/update-sites/
page, theAutomatic Uploads
sidebar link breaks because it adds a second/update-sites/
to the url. Building on 3.9 doesn't have this issue. - Should we even have relative links?
I am highly tempted to just copy all the pages to the base directory and run the check against them there..
@hinerm If you add --root-dir=_site
, those bogus errors should go away.
As for whether we should have relative links: no, we shouldn't. But /update-sites/tos
is not a relative link, it's an absolute one—just to make sure we have our shared terminology straight. I am a fan of all internal links starting with /
, and eschewing the relative_url
Liquid filter completely.
@hinerm If you add --root-dir=_site, those bogus errors should go away.
Thank you @ctrueden!
The script _bin/check-site-html.sh
now checks for broken links using htmlproofer
(thanks @hinerm!). This issue is now "half done" in that the automation is robust. We just need to:
- Actually fix all outstanding broken links; and
- Hook up the
_bin
checker scripts to CI to ensure nothing breaks again in future.