docs
docs copied to clipboard
[CI] Spell-check for documentation?
Would it be possible to add spell checking to CI for documentation (both student-facing and reference)?
Possibly extending to all markdown files (or other doc file types) throughout.
Depending on the method used there could be potential to extend to code samples and stubs, for languages that opt-in (may cause issues, for example in C with function names like strlen()) .
Justification:
There are people from all over the world contributing to Exercism. We cannot expect everyone to know the correct way to spell everything. Even for those of us that English is the first language for can quite often have typos or incorrect spellings in documents.
This might additionally help with those persnickety en-* vs en-US issues that sneak in via muscle memory!
At the moment such items can be caught in review, but this is extra load on a reviewer (who also may not recognise the incorrect spelling or typo). Removing this kind of low-level concern from reviewers, testers and integrators is pretty much the reason CI is a thing.
Fixing typos that make it through review requires another PR. PRs for such relatively small issues may not be deemed worthwhile by anyone that does notice the issue. However a CI check can be added with relatively low effort.
Possible Methods:
There are several spellcheckers that can run in a shell (e.g. GNU Aspell, Hunspell, etc). These could be configured to ignore any inline code sections (or at a stretch, have a script normalise files by removing inline code [remark?] before checking). There are some spellcheckers that will even spellcheck code also, though I don't know if these would work well across the variety of languages and associated casing conventions that Exercism covers.
Caveat:
Any such CI check should at least at first, either (or both) not be blocking, or should allow very easy addition of terms to the dictionary to cover programming and software terms that are not in common use.
Great idea @wolf99! There is a plugin retext-spell that uses Hunspell I think. I've seen other OSS teams leverage this to unify contributions in a great way. (https://github.com/ember-learn/guides-source/blob/d35a871a417e9c425a945369b849b0864cba1a23/.remarkrc.js has an example configuration for remark)
Possibly also @SaschaMann as CI related, I think that's your wheelhouse - thoughts?
Sounds good. Ideally it should add inline annotations or, better yet, suggestions to automatically fix the spelling mistake. Perhaps there are actions for it on the marketplace already. I think having it as a normal pass/fail check might be quite bothersome.
Good suggestions both. I'll look into it for PR. I'm not at all au fait with GitHub actions so it may take a wee while 😄 )
Just as FYI here, life has got really busy since making above assertion. I'm having trouble fitting time in for even any v2 track maintenance effort. So if anyone else wants to pick this up until I get back to things, be my guest 😉
As a maintainer and avid grammarian, I would love if the spell check could check the description field of links.json.
This snippet would extract them when executed in a track's repository:
find . -name 'links.json' | xargs jq '.[] | .description'
This action has the most actions stars: https://github.com/marketplace/actions/github-spellcheck-action
- uses pyspelling
- aspell or hunspell
@SaschaMann This is a good target for an org-wide check.