endless-sky icon indicating copy to clipboard operation
endless-sky copied to clipboard

feat: Expanded CI code style test

Open tibetiroka opened this issue 3 years ago • 6 comments

Feature: This PR introduces a new script that checks the formatting of source files. Although this isn't a replacement for code reviews, it would still be a powerful tool to aid reviewers.

This script mainly aims to detect issues not handled by editorconfig or clang-format.

Feature Details

This script is integrated into Github Workflows to check any changes made by pull requests. The script issues errors or warnings depending on the severity of the issue. If any errors were found, the check fails. Errors:

  • Missing whitespaces around operators (+, -, <=, etc.) in most cases
  • Missing whitespace before */ and after /* and //
  • Trailing whitespace at the end of line
  • Missing line break between if, else if, else, for, switch, while, catch and the following {
  • Missing whitespace before { except if it follows a ( or is part of an inline or struct declaration
  • Multiple consecutive whitespace characters (except in the initial indentation of the lines, and in stylized tables)
  • Extra whitespace after ( except if it is part of a macro call (all-caps name, used in tests), or is followed by a semicolon (empty parameter in for)
  • Extra whitespace before ')', except if it follows a semicolon or if it is after an all-caps word
  • Extra whitespace between if, else, switch and the following (
  • An if, else if, else, for, do, switch, try or catch statement after the beginning of a line
  • A semicolon not at the end of a line, except if it is followed by a ) or a }
  • A line break between try or do and the following {
  • Non-ASCII characters
  • Incorrect copyright headers (to some degree)
  • Lines over 120 characters long (limit for hard wrapping in the style guide)
  • The use of tabulators outside indentation
  • Missing whitespace after comma
  • The use of multiline comments other than the copyright header (other than anonymous parameter names in functions)
  • The use of C-style casts
  • The use of (void) for functions without arguments

Warnings:

  • Missing #include for a .cpp file's matching .h header, unless the filename begins with a lowercase character, or this include not being the first in the list of includes
  • Using both #include "" and #include <> in the same paragraph; opengl.h is treated as a <> include in this case
  • Non-alphabetical ordering of includes within the same paragraph

The script outputs any formatting errors in the following format:

Formatting error in file source/WeightedList.h line 147: if (tw == 0) return 0;
	Reason: extra whitespace before '('

Or, for warnings:

Warning: missing empty line before changing include style in tests/unit/src/test_main.cpp line 15: #include <ctime>

The number of errors and warnings is also output after all checks have finished.

Task List

This PR is a draft since there are various tasks that need doing, and I'd appreciate help with any of these:

  • [x] Create CI integration
  • [x] Clean up code
  • [x] Test all wacky C++ syntax that might be added to the project
    • I'm not saying this is definitely going to be 100% accurate, but it works on the 80k+ lines of code that is currently in the project, so it pretty is unlikely to fail on anything new.
  • [x] Add more detected formatting issues
    • I'm now officially out of ideas, and the script checks most "checkable" things in the C++ Style Guide.

Testing Done

This tool was used to detect (most) issues fixed in #7146, #7179, #7216, and led to the discovery of #7207.

Performance Impact

The new action takes around 25-30 seconds to run, with 15-20 seconds of it being spent on this script.

tibetiroka avatar Aug 19 '22 21:08 tibetiroka

https://github.com/endless-sky/endless-sky/pull/4411 contains an example of how to include this script in CI-runs, and might also contain inspiration for possible other issues to detect. (And thanks for implementing this checker in Python and not in Bash.)

Please also consider to add an "ignore list" for issues in code that mismatches with the style guide, but for which we have good reason to ignore the style guide in specific situations. (The style guide contains very strong recommendations, but we are allowed to make exceptions.)

petervdmeer avatar Aug 20 '22 18:08 petervdmeer

General rearchitecture request: each rule should be a single .py file.

I'm not sure how viable that would be. Many features would either have to be duplicated in each file, or we would need to have some kind of data-sharing feature. The regexes alone would take dozens of files, all of which require input sanitization and some of the regex excludes would have to be shared as well.

At best, I could separate the regex rules from the ones that don't need input sanitization (copyright, line length and #include checks). If you think that's useful, sure thing.

tibetiroka avatar Sep 06 '22 17:09 tibetiroka

Oh boy. This PR blocks itself from merging until the cleanup PR is merged. Seems like tehhowch's suggestion got the CI working properly.

tibetiroka avatar Sep 06 '22 17:09 tibetiroka

General rearchitecture request: each rule should be a single .py file.

I'm not sure how viable that would be. Many features would either have to be duplicated in each file, or we would need to have some kind of data-sharing feature. The regexes alone would take dozens of files, all of which require input sanitization and some of the regex excludes would have to be shared as well.

Why would things need to be duplicated across files? You would import common functionality into the files that need it.

I requested this change because you've implemented a lint program entirely as a monolith with no unit tests, which makes it very difficult to extend or fix.

You can take a peek at how the ESLint program is organized to see an example of this architecture.

(If this also feels "off" because we'd be adding a bunch of python code to support a style check for our c++ program, you'd be right, too. This lint program and the action that consumes it belong most readily in an endless-sky/actions repository, not here.)

tehhowch avatar Sep 07 '22 01:09 tehhowch

I requested this change because you've implemented a lint program entirely as a monolith with no unit tests, which makes it very difficult to extend or fix.

I do agree that the code needs to be easy to extend and easy to fix, but I'm not sure if splitting the file helps to achieve this.

My first impression (after only briefly looking at the code) is the that list of regular expressions are well documented and also quite easy to extend. Moving each regular expression to it's own file feels like a step back in maintainability, because having the list in one place does help for people new to the code to quickly spot the way of thinking for line_include, segment_include, word_include.

Writing unit-tests for a style-checker only makes sense when we expect the style-checker to start living as a product on its own. If it is only used within ES and nowhere else, then I would go for a "don't let perfect be the enemy of good" approach and consider the ES code itself (together with the code-review on the PR) sufficient unit-test for the checker. I already see a lot of value in a style-checker that just works for us; such a style checker already reduces the review-load. (It should be easy to extend and easy to fix, that is a minimum level of quality which I don't want to compromise on.)

You can take a peek at how the ESLint program is organized to see an example of this architecture.

I had a very brief look at https://github.com/eslint/eslint to check what we could learn from there, but I didn't find the examples. Could you provide a more direct link?

This lint program and the action that consumes it belong most readily in an endless-sky/actions repository, not here.

Moving stuff to their own repositories is a good idea, but maybe better to do this with a separate issue and in a separate PR where we identify and move "sets of related" scripts/functionality all in 1 go?

petervdmeer avatar Sep 07 '22 07:09 petervdmeer

I agree with Peter on this one, but I'd still like to add a couple of details here.

  1. This program isn't a fully-featured lint. For the most part (regexes), this script checks each line (or segment or word) independently of each other. It performs some extremely specific heuristic checks that don't cover every case. And by that I mean this isn't going to detect all formatting issues; that would need proper language processing and more context-aware parsing of the code. On the other hand, this extremely selective behaviour makes it easy to maintain and makes false positives extremely unlikely. For the same reason, our detection rate should be pretty high on the supported regexes.

  2. We already have a bunch of bash and even a python and a powershell script in the utils directory. I don't think this script is going to be any harder to maintain; the initial period might be a bit rough, but we got through most of that already. And, to be fair, I don't see why the use of python would be worse than bash. The biggest issue is probably the use of regexes; I've worked with regexes a lot more than with python, and it was still more challenging to use in this context.

tibetiroka avatar Sep 07 '22 18:09 tibetiroka

I'm requesting a review from quyykk and Peter. Quyykk was an essential part of the development process of this PR, especially in code styling questions, while Peter has worked on something similar in the past, so their opinions would be greatly valued here.

Hurlevur has said he doesn't really know python so he couldn't review this PR, Tehhowch is probably not going to appear any time soon, so this is one of the only options for actually getting reviews on this thing.

tibetiroka avatar Sep 25 '22 10:09 tibetiroka

Sorry, I had some pending review-comments (which by accident were not posted yet).

petervdmeer avatar Sep 25 '22 11:09 petervdmeer

Although it doesn't seem to limit its search to the source/ and tests/unit/ directory.

Peter wanted the script to be independent of any directory structure changes (probably because of his refactor pr), which it already was. If I limit the search to those directories, I'd be going against that goal, and I also fail to see what it would gain for us.

Also, the actual filter on the search is based on file extensions, and I don't think any .cpp or .h files will be added to the project that are not subject to these formatting rules.

tibetiroka avatar Sep 30 '22 16:09 tibetiroka

I addressed the relevant parts of @tehhowch's review from the other PR:

  • removed globals
  • moved file I/O to main
  • added check for line separators (in case Git / GitHub doesn't force them)
  • Moved the compilation of regexes to their declaration, instead of reassigning later
    • I kept the inner dictionaries though, because I don't think using them will ever become a performance issue, and they look cleaner in my opinion
  • Removed 'file' from errors/warnings, and reformatted the output
    • The new format is:

       source/Account.cpp
       	ERROR: line 40: consecutive whitespace characters in 'credits  = 0;'
       	WARNING: line 22: includes are not in alphabetical order in '#include <sstream>'
      
    • In this system, errors are always displayed before warnings for a particular file, although that is easy to change.

  • Removed the file parameter from most functions. Some checks require the file name/path (like the copyright or include checks), so i still had to pass it to some functions, but not to most.

EDIT: The checks failed because somebody decided to choose this time to warn us about the Ubuntu 18.04 images in CI.

tibetiroka avatar Oct 18 '22 15:10 tibetiroka

I moved the style fixes to this PR, as requested in the discord server.

tibetiroka avatar Nov 15 '22 19:11 tibetiroka

This is a useful set of style checks and all review-comments seem to be addressed. Let's get those checks to production.

petervdmeer avatar Nov 16 '22 13:11 petervdmeer