docs: Check markdown files for bad links using linkinator
SUMMARY
Adding a little CI magic to test markdown files for problem links using Linkinator. Hopefully this remedies some of the issues stemming from moving pages around.
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
TESTING INSTRUCTIONS
ADDITIONAL INFORMATION
- [ ] Has associated issue:
- [ ] Required feature flags:
- [ ] Changes UI
- [ ] Includes DB Migration (follow approval process in SIP-59)
- [ ] Migration is atomic, supports rollback & is backwards-compatible
- [ ] Confirm DB migration upgrade and downgrade tested
- [ ] Runtime estimates and downtime expectations provided
- [ ] Introduces new feature or API
- [ ] Removes existing feature or API
@mistercrunch seems like the action isn't running :/
I'll investigate in the morning unless you see anything plainly obvious.
visible here -> https://github.com/apache/superset/actions/runs/9027384874
https://github.com/marketplace/actions/markdown-link-check
?
I'll open another one with that other action. This one looked simpler when I opened it, but... I guess it's not so simple!
I opened an Infra ticket to see if there's a preferred practice for cases like this: https://issues.apache.org/jira/issues/?filter=-2
Actually, that other Action is also on a 3rd party repo, so that won't be any easier.
It'd be good to know where the full complex rule lives behind this truncated one ->
Actually the truncation in the screenshot above cuts right before fully pointing to https://github.com/gaurav-nelson/github-action-markdown-link-check
Ok, opened a PR using the other action here: https://github.com/apache/superset/pull/28437
This other action, however, is in maintenance mode, pending some replacement with the author. It also doesn't support multiple file extensions in its config, so we have to run it twice. That's a bit silly, but hopefully it'll work.
Linkinator has now been added to the ASF allow list... let's see if this is simpler/faster than the other PR :)
Thanks for the approval, but this is only the tip of the iceberg :)
This found (if memory serves) something like 34 links that need to be addressed. These will need updates and/or redirects.
It does not find other ones that Google thinks exist, such as one of my perennial google searches leading to this: https://superset.apache.org/docs/contributing/hooks-and-linting/
I'm not sure (yet) if adding frontend (docusaurus) redirects will update search engines, or if .htaccess changes are needed. In either case, this (and others) are strange cases, since they no longer have a direct page equivalent, and will redirect to anchor tags.
Le sigh... much more to do here before this is over.
I mean... it seems like it'll let us merge with the test failing as-is, but that seems like bad hygiene for other PRs that follow before I clean up my mess ;)
Oh yes, we should sort out the stuff. htaccess fix is easy, lots of precedents -> https://github.com/apache/superset/blob/master/docs/static/.htaccess#L67
But better to have no links than broken links, could just wipe out the ones where finding new link isn't obvious
It does not find other ones that Google thinks exist
yes, it can't know about links we used to have, for that I did a pass on a few pages worth of google results, but may be good to go deeper. Prevention could be good too, I think there might be ways to detect that links get delete, maybe some sort of auto-generated list of all links that docusaurus serves. We can resolve in a future PR
it seems like it'll let us merge with the test failing as-is
yes, could make this a required check if you want https://github.com/apache/superset/blob/master/.asf.yaml#L58, but if/when we do this on conditional ones we need to use the ./.github/actions/change-detector/
Sorry for all the noise. A rebase went quite goofy...
I'll start chipping away at more of the failing links.