openoffice
openoffice copied to clipboard
feat(actions): add `pre-commit` framework with `codespell`
Official -> "Git hook scripts are useful for identifying simple issues before submission to code review. We run our hooks on every commit to automatically point out issues in code such as missing semicolons, trailing whitespace, and debug statements. By pointing these issues out before code review, this allows a code reviewer to focus on the architecture of a change while not wasting time with trivial style nitpicks."
https://pre-commit.com/
Using a pre-commit
framework speeds up development as a lot of tests can be run on the local machine giving instant feedback. So we don't have to wait for the CI / GitHub actions to run to get feedback. The pre-commit automatically fixes some of the issues when you do git commit and if there are any issues the tests are marked as red failed. Then you will need to commit again so that all the tests pass green.
When pre-commit runs with GitHub Actions on the GitHub website the hooks/tests either pass or fail.
There are many more pre-commit checks listed here -> https://pre-commit.com/hooks.html
Lets get this PR merged and then I will look at adding more pre-commit tests 👍
This PR adds codespell
to our pre-commit hooks.
The words in codespell.txt
are ignored and this file has basically been created by running:
codespell . | cut -f2 -d' ' | tr A-Z a-z | sort | uniq > codespell.txt
from the repo root.
https://github.com/codespell-project/codespell
codespell
is one of the leading spell checkers on GitHub.
Going forwards we will need to fix a lot of the misspelled words that are in codespell.txt
https://github.com/codespell-project/codespell https://pypi.org/project/codespell/
From the official repo:
"Fix common misspellings in text files. It's designed primarily for checking misspelled words in source code (backslash escapes are skipped), but it can be used with other files as well."
So it checks a lot more than just comments.
Some of the misspelled words maybe just code terms that we need to ignore.
That's a huge number of misspellings
About the linter is it only checking comments?
If needed you can also exclude files / folders from being spell checked.
Apache Airflow is using codespell
with pre-commit
as seen at the next link:
https://github.com/apache/airflow/blob/41f4766d5b4873ddbf8daa94d837398342aeaf98/.pre-commit-config.yaml#L274
And they have about 1,800 lines in their ignored words list seen here:
https://github.com/apache/airflow/blob/main/docs/spelling_wordlist.txt
/extras would definitely need to be excluded. It contains the translations for all languages. so I would expect a lot of "false positives".