tidyverse.org
tidyverse.org copied to clipboard
Add link-check linter
@jimhester, did you mention the existence of something like this once?
Yeah, https://docs.travis-ci.com uses https://github.com/gjtorikian/html-proofer to do this.
There is some documentation on using it with travis at https://github.com/gjtorikian/html-proofer/wiki/Using-HTMLProofer-From-Ruby-and-Travis
If we wanted to do this we could probably just do the 'stupid thing' they have on there
language: ruby
before_install:
- export NOKOGIRI_USE_SYSTEM_LIBRARIES=true
addons:
apt:
packages:
- libcurl4-openssl-dev # required to avoid SSL errors
script:
- gem install html-proofer && htmlproofer .
Great.
There's also the rakefile option:
desc "Run the HTML-Proofer"
task :run_proofer do
require 'html-proofer'
# Ignore platform switcher hash URLs
platform_hash_urls = ['#platform-mac', '#platform-windows', '#platform-linux', '#platform-all']
HTMLProofer.check_directory("./output", {
:url_ignore => platform_hash_urls,
:typhoeus => { :ssl_verifypeer => false }
}).run
end
https://github.com/atom/flight-manual.atom.io/blob/e6fa143f745fbd4908933e2ba67615eac8b24cff/Rakefile#L30-L40
Rmd link checker? https://github.com/fmichonneau/checker
Might be worth looking if there's a pre-made github action for this
There seem to be a few, e.g. https://github.com/marketplace/actions/link-checker, https://github.com/gaurav-nelson/github-action-markdown-link-check,
After trying them out locally it seems really easy to hit the GitHub rate limits and start getting 429 responses from GitHub. We would probably have to be careful to only check modified files and maybe even find a link checker (or make one) that uses a GITHUB_PAT to up the rate limit.
Sharing my experience in case it can be useful: I've just added two workflows to R-hub docs depending on https://github.com/urlstechie/urlchecker-action that I've found out about on a blog. It's quite new but the maintainers are very responsive. It uses a Python library that uses regex, not commonmark :cry: but that's because it's general to any file (Rd files? :slightly_smiling_face:).
-
workflow running when a label "needs-url-checks" is applied to a PR It creates a check run magically. I used the GitHub API to find the files that had been modified. However, that means only 30 filenames would appear (an alternative would be to use the checkout action for all branches or so, and git diff + git merge-base).
Those are nice @maelle! I think we should be able to
- Run a workflow only when posts happen, e.g. files change in a given directory
- Run the checks only on the files that have changed
With something like this. (it definitely won't work as-is, but you hopefully get the idea)
on:
push:
paths: content/blog/**/*.markdown
check_links:
run: |
files=$(git diff --name-only ${{ github.event.before }} ${{ github.sha }})
run_checker $files
Can https://validator.w3.org/checklink maybe help? Good that this issue is being worked on. I looked it up as a link to https://rdrr.io/... just gave a 404 while that site was not down.