tidyverse.org icon indicating copy to clipboard operation
tidyverse.org copied to clipboard

Add link-check linter

Open batpigandme opened this issue 7 years ago • 11 comments

@jimhester, did you mention the existence of something like this once?

batpigandme avatar Sep 28 '18 11:09 batpigandme

Yeah, https://docs.travis-ci.com uses https://github.com/gjtorikian/html-proofer to do this.

jimhester avatar Sep 28 '18 11:09 jimhester

There is some documentation on using it with travis at https://github.com/gjtorikian/html-proofer/wiki/Using-HTMLProofer-From-Ruby-and-Travis

jimhester avatar Sep 28 '18 11:09 jimhester

If we wanted to do this we could probably just do the 'stupid thing' they have on there

language: ruby
before_install:
 - export NOKOGIRI_USE_SYSTEM_LIBRARIES=true
addons:
  apt:
    packages:
    - libcurl4-openssl-dev # required to avoid SSL errors
script:
 - gem install html-proofer && htmlproofer .

jimhester avatar Sep 28 '18 12:09 jimhester

Great.

There's also the rakefile option:

desc "Run the HTML-Proofer"
task :run_proofer do
  require 'html-proofer'


  # Ignore platform switcher hash URLs
  platform_hash_urls = ['#platform-mac', '#platform-windows', '#platform-linux', '#platform-all']
  HTMLProofer.check_directory("./output", {
    :url_ignore => platform_hash_urls,
    :typhoeus => { :ssl_verifypeer => false }
  }).run
end

https://github.com/atom/flight-manual.atom.io/blob/e6fa143f745fbd4908933e2ba67615eac8b24cff/Rakefile#L30-L40

batpigandme avatar Sep 28 '18 12:09 batpigandme

Rmd link checker? https://github.com/fmichonneau/checker

batpigandme avatar Nov 16 '18 12:11 batpigandme

Might be worth looking if there's a pre-made github action for this

hadley avatar Apr 03 '20 14:04 hadley

There seem to be a few, e.g. https://github.com/marketplace/actions/link-checker, https://github.com/gaurav-nelson/github-action-markdown-link-check,

jimhester avatar Apr 06 '20 13:04 jimhester

After trying them out locally it seems really easy to hit the GitHub rate limits and start getting 429 responses from GitHub. We would probably have to be careful to only check modified files and maybe even find a link checker (or make one) that uses a GITHUB_PAT to up the rate limit.

jimhester avatar Apr 06 '20 14:04 jimhester

Sharing my experience in case it can be useful: I've just added two workflows to R-hub docs depending on https://github.com/urlstechie/urlchecker-action that I've found out about on a blog. It's quite new but the maintainers are very responsive. It uses a Python library that uses regex, not commonmark :cry: but that's because it's general to any file (Rd files? :slightly_smiling_face:).

maelle avatar Apr 14 '20 18:04 maelle

Those are nice @maelle! I think we should be able to

  1. Run a workflow only when posts happen, e.g. files change in a given directory
  2. Run the checks only on the files that have changed

With something like this. (it definitely won't work as-is, but you hopefully get the idea)

on:
  push:
    paths: content/blog/**/*.markdown
  check_links:
     run: |
        files=$(git diff --name-only ${{ github.event.before }} ${{ github.sha }})
        run_checker $files

jimhester avatar Apr 14 '20 18:04 jimhester

Can https://validator.w3.org/checklink maybe help? Good that this issue is being worked on. I looked it up as a link to https://rdrr.io/... just gave a 404 while that site was not down.

steltenpower avatar Sep 21 '20 09:09 steltenpower