tldr
tldr copied to clipboard
Repository: Adding Codespell to our CI
We can add Codespell to the CI to help with removing the small spelling errors and typos which slip through our reviews.
I am sure we could add other dictionaries to find errors in other languages however here is a demo script
name: Codespell
on:
pull_request:
branches: [ "main" ]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Get a list of changed files
uses: jitterbit/get-changed-files@v1
id: changed_files
- uses: codespell-project/actions-codespell@master
with:
only_warn: 1 # so the CI doesn't fail (if the error is a false positive)
path: ${{ steps.changed_files.outputs.all }} # Only checks the files in the PR
skip: pages.*/*/* # Currently skipping all other languages
The output of this can be seen when reviewing the lines where the errors are seen in the files tab of a PR, making so that the contributor doesn't have to read the CI logs.
This check would be run on every PR every time a new commit is made, right? Wouldn't that make the checks take even longer?
Also, I feel like this would create a lot of false positives. I assume we use a lot of very specific words, which don't actually exist (e.g. commits
).
There would of course be false positives, however not too many.
Notable words such as crate
and some acronyms are marked as incorrect, see here for all current repo false positives (there maybe a few actual errors).
The speed difference is only about 5 seconds more (after CI spinup)
also, the false positives are not too bad since using the only warn
flag, the CI will always pass the codespell test, then comment the errors it finds.
Would we have a file in this repo, where we could add custom words?
Yes. https://github.com/codespell-project/actions-codespell#parameter-ignore_words_file
Good idea in theory, as @marchersimon I'm concerned about the false positives and the extra maintenance burden this would bring in merging new PRs. It could make things more complicated if a new author comes along and waants to contribute a page, but then we have to handle the false positives that may bring.
While I am sure that there will be false positives, if we add some existing words which are flagged, the maintenance becomes merely adding words to a file (wouldn't have to be a PR).
However yes, adding more things to deal with as a newcomer is a bad thing, the upside removes future PRs dedicated to spelling error fixes, which are repetitive and dull for reviewers.
I propose a trial of using it, what do you think?
So here is a version which actually works see #8353 for endless commits fixing syntax errors and other mistakes.
name: Codespell
on: ['pull_request']
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Get changed files
id: changed-files
uses: tj-actions/[email protected]
with:
files_ignore: pages.*/*/*
- uses: codespell-project/actions-codespell@master
with:
ignore_words_file: .github/codespell-ignore
only_warn: 1
path: ${{ steps.changed-files.outputs.all_changed_files }}
I will make a PR to be reviewed where a final decision can be made, the speed difference is almost nothing (if not 0) since the main CI takes 24 seconds whilst the codespell action takes 15 (don't fact check that, it was just the readings of the latest test run as of now)