openoffice icon indicating copy to clipboard operation
openoffice copied to clipboard

feat(actions): add `pre-commit` framework with `codespell`

Open jbampton opened this issue 1 year ago • 3 comments

Official -> "Git hook scripts are useful for identifying simple issues before submission to code review. We run our hooks on every commit to automatically point out issues in code such as missing semicolons, trailing whitespace, and debug statements. By pointing these issues out before code review, this allows a code reviewer to focus on the architecture of a change while not wasting time with trivial style nitpicks."

https://pre-commit.com/

Using a pre-commit framework speeds up development as a lot of tests can be run on the local machine giving instant feedback. So we don't have to wait for the CI / GitHub actions to run to get feedback. The pre-commit automatically fixes some of the issues when you do git commit and if there are any issues the tests are marked as red failed. Then you will need to commit again so that all the tests pass green.

When pre-commit runs with GitHub Actions on the GitHub website the hooks/tests either pass or fail.

There are many more pre-commit checks listed here -> https://pre-commit.com/hooks.html

Lets get this PR merged and then I will look at adding more pre-commit tests 👍

This PR adds codespell to our pre-commit hooks.

The words in codespell.txt are ignored and this file has basically been created by running:

codespell . | cut -f2 -d' ' | tr A-Z a-z | sort | uniq > codespell.txt

from the repo root.

https://github.com/codespell-project/codespell

codespell is one of the leading spell checkers on GitHub.

Going forwards we will need to fix a lot of the misspelled words that are in codespell.txt

jbampton avatar Dec 01 '23 19:12 jbampton

https://github.com/codespell-project/codespell https://pypi.org/project/codespell/

From the official repo:

"Fix common misspellings in text files. It's designed primarily for checking misspelled words in source code (backslash escapes are skipped), but it can be used with other files as well."

So it checks a lot more than just comments.
Some of the misspelled words maybe just code terms that we need to ignore.

jbampton avatar Dec 02 '23 08:12 jbampton

That's a huge number of misspellings

About the linter is it only checking comments?

If needed you can also exclude files / folders from being spell checked.

Apache Airflow is using codespell with pre-commit as seen at the next link:

https://github.com/apache/airflow/blob/41f4766d5b4873ddbf8daa94d837398342aeaf98/.pre-commit-config.yaml#L274

And they have about 1,800 lines in their ignored words list seen here:

https://github.com/apache/airflow/blob/main/docs/spelling_wordlist.txt

jbampton avatar Dec 02 '23 09:12 jbampton

/extras would definitely need to be excluded. It contains the translations for all languages. so I would expect a lot of "false positives".

Pilot-Pirx avatar Dec 02 '23 10:12 Pilot-Pirx