docs icon indicating copy to clipboard operation
docs copied to clipboard

[CI] Spell-check for documentation?

Open wolf99 opened this issue 5 years ago • 8 comments

Would it be possible to add spell checking to CI for documentation (both student-facing and reference)? Possibly extending to all markdown files (or other doc file types) throughout. Depending on the method used there could be potential to extend to code samples and stubs, for languages that opt-in (may cause issues, for example in C with function names like strlen()) .

Justification:

There are people from all over the world contributing to Exercism. We cannot expect everyone to know the correct way to spell everything. Even for those of us that English is the first language for can quite often have typos or incorrect spellings in documents.

This might additionally help with those persnickety en-* vs en-US issues that sneak in via muscle memory!

At the moment such items can be caught in review, but this is extra load on a reviewer (who also may not recognise the incorrect spelling or typo). Removing this kind of low-level concern from reviewers, testers and integrators is pretty much the reason CI is a thing.

Fixing typos that make it through review requires another PR. PRs for such relatively small issues may not be deemed worthwhile by anyone that does notice the issue. However a CI check can be added with relatively low effort.

Possible Methods:

There are several spellcheckers that can run in a shell (e.g. GNU Aspell, Hunspell, etc). These could be configured to ignore any inline code sections (or at a stretch, have a script normalise files by removing inline code [remark?] before checking). There are some spellcheckers that will even spellcheck code also, though I don't know if these would work well across the variety of languages and associated casing conventions that Exercism covers.

Caveat:

Any such CI check should at least at first, either (or both) not be blocking, or should allow very easy addition of terms to the dictionary to cover programming and software terms that are not in common use.

wolf99 avatar Apr 13 '20 10:04 wolf99

Great idea @wolf99! There is a plugin retext-spell that uses Hunspell I think. I've seen other OSS teams leverage this to unify contributions in a great way. (https://github.com/ember-learn/guides-source/blob/d35a871a417e9c425a945369b849b0864cba1a23/.remarkrc.js has an example configuration for remark)

ghost avatar Apr 13 '20 13:04 ghost

Possibly also @SaschaMann as CI related, I think that's your wheelhouse - thoughts?

wolf99 avatar Apr 15 '20 21:04 wolf99

Sounds good. Ideally it should add inline annotations or, better yet, suggestions to automatically fix the spelling mistake. Perhaps there are actions for it on the marketplace already. I think having it as a normal pass/fail check might be quite bothersome.

SaschaMann avatar Apr 15 '20 22:04 SaschaMann

Good suggestions both. I'll look into it for PR. I'm not at all au fait with GitHub actions so it may take a wee while 😄 )

wolf99 avatar Apr 15 '20 22:04 wolf99

Just as FYI here, life has got really busy since making above assertion. I'm having trouble fitting time in for even any v2 track maintenance effort. So if anyone else wants to pick this up until I get back to things, be my guest 😉

wolf99 avatar May 16 '20 20:05 wolf99

As a maintainer and avid grammarian, I would love if the spell check could check the description field of links.json. This snippet would extract them when executed in a track's repository:

find . -name 'links.json'  | xargs jq '.[] | .description'

ghost avatar Feb 24 '21 14:02 ghost

This action has the most actions stars: https://github.com/marketplace/actions/github-spellcheck-action

ghost avatar Mar 08 '21 15:03 ghost

@SaschaMann This is a good target for an org-wide check.

iHiD avatar Mar 08 '21 17:03 iHiD