imagej.github.io icon indicating copy to clipboard operation
imagej.github.io copied to clipboard

Ensure there are no broken links, and automate the check

Open ctrueden opened this issue 3 years ago • 5 comments

The script _bin/broken-links.sh prints out links that it detects as broken. But it needs updating to handle additional cases:

  • /ij/* – proxied mirror.imagej.net content – needs update to serve from /ij and redirect old links from the root where feasible (e.g. /macros)
  • Other repos in this org: /presentations, /workshops, /tutorials, /list-of-update-sites, others?

List of known observed weirdness so far:

  • BigDataServer: INFO__ are linkish, but are surrounded in backticks. These should be fixed, and other instances of backtick-mangling should be checked for.
  • Some MediaWiki-style links got escaped with backslashes—\[link title\]—I fixed many of them but would be good to double check there aren't any remaining.
  • (Category_Segmentation) (and similar) links—and _bin/broken-links.sh does not find them.

Once we have a robust dead link checker, we also need to hook it up to an action to check when links break.

See also #55, #63 (IJ1 page renames), #66

ctrueden avatar Apr 20 '21 16:04 ctrueden

Things that may be broken:

  • Links to /media/[subfolder]/.. currently everything should just go to /media/
  • Double encoded ampersands (&)

I looked at the _bin/broken_links.sh script but don't really understand how to modify it to add these things..

hinerm avatar May 06 '21 20:05 hinerm

I've been using htmlproofer which seems great except I can't get it to understand relative paths from the site root. For example

$ htmlproofer update-sites/index.html --disable_external --assume_extension --allow-hash-href --url-ignore "///list-of-update-sites/"

produces hundreds of failures of the form:

     <a href="/update-sites/tos">ToS for personal update sites</a>
  *  internally linking to /update-sites/tos, which does not exist (line 90)
     <a href="/update-sites/tos">ToS for personal update sites</a>
  *  internally linking to /update-sites/tos, which does not exist (line 234)
     <a href="/update-sites/tos">ToS for personal update sites</a>
  *  internally linking to /update-sites/tos, which does not exist (line 234)

but running the same command on the root index.html works fine even though the links are the same. But "true" relative paths, e.g. ../update-sites/tos would work.

This led me down a horrible path:

  • Jekyll 4.2 does something different in generating relative paths. When I build with 4.2 and go to the my local /update-sites/ page, the Automatic Uploads sidebar link breaks because it adds a second /update-sites/ to the url. Building on 3.9 doesn't have this issue.
  • Should we even have relative links?

I am highly tempted to just copy all the pages to the base directory and run the check against them there..

hinerm avatar May 07 '21 20:05 hinerm

@hinerm If you add --root-dir=_site, those bogus errors should go away.

As for whether we should have relative links: no, we shouldn't. But /update-sites/tos is not a relative link, it's an absolute one—just to make sure we have our shared terminology straight. I am a fan of all internal links starting with /, and eschewing the relative_url Liquid filter completely.

ctrueden avatar May 07 '21 23:05 ctrueden

@hinerm If you add --root-dir=_site, those bogus errors should go away.

Thank you @ctrueden!

hinerm avatar May 10 '21 15:05 hinerm

The script _bin/check-site-html.sh now checks for broken links using htmlproofer (thanks @hinerm!). This issue is now "half done" in that the automation is robust. We just need to:

  1. Actually fix all outstanding broken links; and
  2. Hook up the _bin checker scripts to CI to ensure nothing breaks again in future.

ctrueden avatar May 14 '21 15:05 ctrueden