docs icon indicating copy to clipboard operation
docs copied to clipboard

Check that links work before deploying

Open penelopeysm opened this issue 1 year ago • 10 comments

Since our internal links now all use meta variables, I'm not sure if there's any software out there that can check link validity at the source code stage. (We could probably write one, but that feels a bit excessive)

So instead we should probably run some sort of HTML link checker on the Quarto output itself, perhaps https://github.com/filiph/linkcheck

penelopeysm avatar Oct 13 '24 11:10 penelopeysm

Even if we run a link check, our links will return 404 pages. Should these be considered broken links, or is it acceptable since they return some HTML content? And it would be better to run a link check directly on our Quarto markdown pages, so we don’t have to wait for the full site to render just to get link check approval.

shravanngoswamii avatar Oct 14 '24 06:10 shravanngoswamii

I totally agree about running it on the Quarto docs being preferable, but I'm just not sure about it from a practical point of view -- do we have something that is capable of checking relative links between documents? I do think we could put something together to do that, but I'm not sure it's really worth the time.

I don't immediately see how we would get 404s! Could you explain?

penelopeysm avatar Oct 14 '24 10:10 penelopeysm

@penelopeysm https://github.com/TuringLang/docs/pull/530#issuecomment-2402730693 I raised this discussion earlier on #530 maybe this is possible if we think of it. I agree this may take an extra overhead but it may solve the issue

beingPro007 avatar Oct 21 '24 16:10 beingPro007

@beingPro007, feel free to open a PR. I'll mention quickly that the workflow you suggested in https://github.com/TuringLang/docs/pull/530#issuecomment-2402730693 probably needs to be tweaked:

  1. If you want to run this on a fresh workflow, you need to check out the gh-pages branch as that is where the docs are located.
  2. It would be much better to add the link checking into the existing preview and publish workflows, rather than to make a new workflow.

penelopeysm avatar Oct 22 '24 22:10 penelopeysm

do we have something that is capable of checking relative links between documents? I do think we could put something together to do that, but I'm not sure it's really worth the time.

I am also not sure of this!

I don't immediately see how we would get 404s! Could you explain?

I do not know how link checker's work, I am just guessing that they search the link using something and if they do not get any HTML content from it then the particular link is considered broken, so in our case if a broken link is search then it will return html content of 404 error page like this one: https://turinglang.org/broken

shravanngoswamii avatar Oct 23 '24 05:10 shravanngoswamii

IMO, it will be much better to make a reusable workflow that checks links in a html, md, mdx, other markup language's...!

shravanngoswamii avatar Oct 23 '24 05:10 shravanngoswamii

if they do not get any HTML content from it then the particular link is considered broken

It's true that you will still get HTML contents, but the response code will still be a 404 and that should track as a failure:

$ curl -I https://turinglang.org/nope
HTTP/2 404
server: GitHub.com
content-type: text/html; charset=utf-8
access-control-allow-origin: *
etag: "66fd83a7-7010"
x-proxy-cache: MISS
x-github-request-id: D3B0:34BBDA:3A59292:3B14866:6718E69B
accept-ranges: bytes
date: Wed, 23 Oct 2024 12:06:46 GMT
via: 1.1 varnish
age: 58
x-served-by: cache-lhr-egll1980035-LHR
x-cache: HIT
x-cache-hits: 1
x-timer: S1729685206.999750,VS0,VE5
vary: Accept-Encoding
x-fastly-request-id: 7ab19db7f227bfceaf2d9e270894f6c9c79a8192
content-length: 28688

penelopeysm avatar Oct 23 '24 12:10 penelopeysm

I wasn't aware of it, thanks for clarification!

shravanngoswamii avatar Oct 23 '24 14:10 shravanngoswamii

I’ve been experimenting with some link checkers, and while it’s straightforward to run them on our generated HTML, it might not be the best approach for us as developers.

I found tcort/markdown-link-check, this works well for Quarto documents and correctly checks relative links. However, I'm encountering issues with our meta and var shortcodes used in links. @penelopeysm, do you have any suggestions for how we can address this?

  • One option is to write a small extension/filter for Quarto to handle these meta and var shortcodes, but it might not be worth the effort since these shortcodes aren’t primarily intended for links.
  • Alternatively, we could create a composite action using tcort/markdown-link-check and manually handle the meta and var shortcodes with a shell script or another scripting language that works easily across Windows, Mac, and Linux so users can also run our link checker locally!

shravanngoswamii avatar Nov 03 '24 16:11 shravanngoswamii

I think that's pretty much what I meant when I said I wasn't sure whether it was worth our time.

  1. Personally, I think the best reward / effort ratio is to just run the link checker on the generated HTML. It means that it is difficult to check it locally (you have to render the pages first) but at least you can see the output from CI and correct things as needed after that.

  2. One other option is to just skip checking the meta and var shortcodes, which will make it easy to do locally, but you'll never know when one of those is broken, which I'm not comfortable with.

  3. Or as you say, we could write something ourselves. You're welcome to try either of those approaches (I think a custom shell / Julia script would be the best!) but that's not something I'd personally sink time into.

penelopeysm avatar Nov 03 '24 16:11 penelopeysm