stumptown-content icon indicating copy to clipboard operation
stumptown-content copied to clipboard

Should spelling be breaking CI?

Open peterbe opened this issue 4 years ago • 10 comments

E.g. https://travis-ci.org/mdn/stumptown-content/builds/572764499

(if the logs disappear, here's rougly what it says)

> [email protected] spell-md /home/travis/build/mdn/stumptown-content
> mdspell -a -n -r -x --en-us 'content/**/!(*contributors).md'

    [1mcontent/html/guides/Applying_color.md[22m
       27 |  as the addition of under- or [31moverlines[39m, strike-through lines, and so 
       39 | orations (such as underlines, [31mstrikethroughs[39m, etc) use the `color` propert 
       97 |     Lets you draw 2D [31mbitmapped[39m graphics in a [`<canvas>`](/e 
      105 | he Web Graphics Library is an [31mOpenGL[39m ES-based API for drawing high 
      125 | s a number between 0 and 255 ([31m0x00[39m and 0xFF) or, optionally, as  
      125 | r between 0 and 255 (0x00 and [31m0xFF[39m) or, optionally, as a number  
      125 | as a number between 0 and 15 ([31m0x0[39m and 0xF). All components _mus 
      125 | ber between 0 and 15 (0x0 and [31m0xF[39m). All components _must_ be sp 
      169 | ite). Image courtesy of user [[31mSharkD[39m](http://commons.wikimedia.org 
      171 | ees (`deg`), radians (`rad`), [31mgradians[39m (`grad`), or turns (`turn`).  
      178 | 2. Then select a [31mgrayscale[39m paint that corresponds how br 
      193 | t a color. Perhaps you have a [31mcustomizable[39m user interface, or you're imp 
      215 | rk. For example, the website [[31mColorZilla[39m](http://www.colorzilla.com/)  
      226 | - [[31mPaletton[39m](http://paletton.com) 
      237 | my.org/) in association with [[31mPixar[39m](https://www.pixar.com/)) 
      239 | o express ideas. Presented by [31mPixar[39m artists and designers. 
      255 | - [[31mMedline[39m Plus: Color Blindness](https: 
      256 | - [American Academy of [31mOphthamology[39m: What Is Color Blindness?](ht 
      257 | 010/02/color-blindness.html) ([31mUsability.gov[39m: United States Department of  
      261 | there. We carefully avoid the [31mmockups[39m and the photos from movies. A 
      261 | hoto taken by one of the Mars [31mlanders[39m humanity has parked on the su 
      265 | ur palette. We decide to use [[31mPaletton[39m](http://www.paletton.com/) to 
      265 |  colors we need. Upon opening [31mPaletton[39m, we see: 
      273 |  (currently "Monochromatic"). [31mPaletton[39m computes an appropriate accen 
      299 | s a "don't print backgrounds" [31mcheckbox[39m in a print dialog box), that  

[31m>>[39m 25 spelling errors found in 274 files

I haven't been involved in the spell-md integration work but this seems very draconian. For example, what's wrong with the word "checkbox" given the eco-system we're in?

One can configure branches of a matrix in .travis.yml to allow failures and they show up as warnings. But!! ...who clicks into a TravisCI build (to see the warnings) as soon as you get that sweet green checkbox icon on the pull request?

To me, a much more attractive option would be to only run md-spell if all the other CI stuff passes. And, if there are spell check problems, post that as an automated comment on the pull request. As a matter of fact, since you have to script that anyway, would could do something like this:

function didmarkdownchange() {
    git diff --name-only origin/master | grep "\.md" && return
    false
}

if didmarkdownchange; then
    echo "At least one .md file changed"
    # TODO make it only spell check *changed* files
    md-spell ...
else
    echo "No .md changes, no spellcheck run"
fi

peterbe avatar Aug 19 '19 13:08 peterbe

I like the idea of incremental spell check and failing CI on spell check failure, provided we extend the dictionary to cover the legitimate words in our corpus.

ddbeck avatar Aug 19 '19 15:08 ddbeck

provided we extend the dictionary to cover the legitimate words in our corpus.

Fair enough. If you think we can pull it off. It might be a whack-a-mole thing where we add a couple of web-dev'y terms and then someone's PR breaks because of a new legitmate word (e.g. DOMColorPalette) and then you have to update the spelling overrides and rebase the PR etc.

By the way, why does it complain about the word "Pixar"?? Since it's a noun, it's written capitalized. Shouldn't the spell checker notice that and learn to avoid it?

peterbe avatar Aug 19 '19 16:08 peterbe

markdown-spellcheck has a mode to help fix or accept new words quickly: https://www.npmjs.com/package/markdown-spellcheck#interactive-mode Maybe we need a notice after a failed spell check to run spell-check with the extra flag.

As for why, I'm guessing that a mode which ignores proper nouns would have no way of discerning a proper noun at the beginning of a sentence from a common noun.

ddbeck avatar Aug 19 '19 16:08 ddbeck

See also https://github.com/mdn/stumptown-content/issues/71 though.

FWIW I think if we are going to run spellcheck in CI then it should fail CI if spellcheck fails. Otherwise why bother?

I agree that we might get into whack-a-mole, but let's see if we do and not give up right away. I haven't looked into its config very much (and am not sure if there's any point, given https://github.com/mdn/stumptown-content/issues/71) but maybe you could omit words in code formatting, which would help with identifiers?

wbamberg avatar Aug 20 '19 21:08 wbamberg

BTW the spellcheck for that file did catch several real typoes in the MDN source, which I fixed in the PR. So it did add value as well as noise :).

wbamberg avatar Aug 20 '19 21:08 wbamberg

should fail CI if spellcheck fails. Otherwise why bother?

This is where I think I can offer a suggestion; if the PR has spell checking issues, don't break the build but post a comment on the PR here on GitHub. That gets pretty good visibility without being obnoxious or overly noisy.

It's akin to how the Kuma coverage reports work. There might be perfectly sane reasons why the total unit test coverage goes down so it's good that it doesn't break the build. Codecov's post relatively neat comments. For example that are there to say "Are you sure this is good? Did you consider that the test coverage went down?"

If we do do this, it would be crucial to only post comments about spell checking problems in files you touched in the PR.

peterbe avatar Aug 21 '19 18:08 peterbe

run spellcheck in CI then it should fail CI if spellcheck fails. Otherwise why bother?

Also, it might be that we have the spell checking functionality as a tool but don't run it in Travis. It's less attractive as an option but it's an option :)

Anyway, let's tackle https://github.com/mdn/stumptown-content/issues/71 and then start doing the whack-a-mole strategy till we learn more.

peterbe avatar Aug 21 '19 18:08 peterbe

The most obnoxious errors I can see in this output are these:

      125 | s a number between 0 and 255 (0x00 and 0xFF) or, optionally, as  
      125 | r between 0 and 255 (0x00 and 0xFF) or, optionally, as a number  
      125 | as a number between 0 and 15 (0x0 and 0xF). All components _mus 
      125 | ber between 0 and 15 (0x0 and 0xF). All components _must_ be sp 

I couldn't see a way to add wildcards to .spelling for these errors.

wbamberg avatar Aug 21 '19 18:08 wbamberg

I don't know about GitHub actions but something I'd like to explore is to see if spelling errors can cause a warning rather than an error.

peterbe avatar Sep 06 '19 13:09 peterbe

It's akin to how the Kuma coverage reports work. There might be perfectly sane reasons why the total unit test coverage goes down so it's good that it doesn't break the build. Codecov's post relatively neat comments. For example that are there to say "Are you sure this is good? Did you consider that the test coverage went down?"

I'm not sure this is a good analogy, because false positives aside, I don't think there are perfectly sane reasons why our docs should have spelling errors. And for false positives it seems like we will want to want to teach the tool about them. Otherwise the tool will complain about the same false positive every time someone edits that file, and leaving false positives is noise that can mask real errors.

If spelling errors don't break the build then the obvious concern is that people won't pay attention to them, and then why bother checking for them?

Overall though I think it is too early to be having this conversation. So far spellcheck has broken the build once, and it took me a minute to make a PR to fix the errors. Let's update the tool (https://github.com/mdn/stumptown-content/issues/71) and see how much of a burden spellcheck is.

wbamberg avatar Sep 12 '19 22:09 wbamberg