bridle icon indicating copy to clipboard operation
bridle copied to clipboard

Validate the new ClangFormat module in ci/check_compliance.py

Open rexut opened this issue 7 months ago • 2 comments

I did a small test of specifically the formatting check (running check_compliance.py -m ClangFormat), and it doesn't seem to work correctly.

As a test scenario I copied the button sample into new_button, ran clang-format on the new files, and commited that.

I then make a small change, adding a declaration with a comment:

Modified regular file samples/new_button/src/event.c:
    ...
   6    6: #include <zephyr/device.h>
   7    7: #include "init.h"
   8    8: 
        9: int a; /* This is a */
       10: 
   9   11: /* Interrupt handler  */
  10   12: static void button_pressed(const struct device *dev, struct gpio_callback *cb, uint32_t pins)
  11   13: {

Running against the series of now 2 commits is still green, as it should.

I then make another change, adding another declaration:

Modified regular file samples/new_button/src/event.c:
    ...
   7    7: #include "init.h"
   8    8: 
   9    9: int a; /* This is a */
       10: int b_has_a_very_long_name; /* This is b*/
  10   11: 
  11   12: /* Interrupt handler  */
  12   13: static void button_pressed(const struct device *dev, struct gpio_callback *cb, uint32_t pins)

The file is now incorrectly formatted, and the check points this out, but the message is confusing:

See https://docs.zephyrproject.org/latest/contribute/guidelines.html#clang-format for more details.

 You may want to run clang-format on this change:
-int a; /* This is a */
+int a;                      /* This is a */

File:samples/new_button/src/event.c
Line:10

It says I may want to run clang-format on a change, but what it shows isn't a change that I made; instead it shows the correction that should be applied. It also doesn't tell me which commit it found this in, but I from what I observe later I'm not sure it even knows itself.

I then make a final change, which is to format the file correctly:

Modified regular file samples/new_button/src/event.c:
    ...
   6    6: #include <zephyr/device.h>
   7    7: #include "init.h"
   8    8: 
   9    9: int a;                      /* This is a */
  10   10: int b_has_a_very_long_name; /* This is b*/
  11   11: 
  12   12: /* Interrupt handler  */
    ...

I now have a series of 3 commits, one of which is bad. But when I run the check again, it's green, missing the problem in the second commit.

But it gets even more confusing: I had been running the check after each commit, with -c main to set the commit range. Now that I have the full series, I'm curious if just running it commit by commit gives more useful results.

My log looks like this now (oldest changes at the top, newest at the bottom. The git commit hashes are on the right hand side, the numbers on the left are jujutsu change ids that can be ignored.):

◆  uzzpzmxz [email protected] 2025-05-28 09:59:27 main next@origin 1f570f29
│  tests: Emulator based tests for sc18is604 I2C driver
○  nvtnlqtw [email protected] 2025-06-05 12:45:47 f1964c98
│  samples: New sample
○  rwwowvno [email protected] 2025-06-05 12:45:47 64ada3e2
│  Made a change
○  llnukymt [email protected] 2025-06-05 12:45:47 f91855da
│  Added b
○  optwtlwm [email protected] 2025-06-05 12:45:48 git_head() 9e80e0c2
│  fixed formatting
@  vxqynovw [email protected] 2025-06-05 13:01:23 b380e9a4
   (empty) (no description set)

I try running the check with -c main..f1964c98. Clean, as it should. I try -c f1964c98..f1964c98. Also clean, as it should. Next is the interesting one: -c 64ada3e2..f91855da. It shows up clean. That's odd. For completeness, I also try -c f91855da..9e80e0c2. It's clean, which is expected if the commit range excludes the first referent (see below for why that doesn't seem to be right, either).

Maybe the -c 64ada3e2..f91855da run only sees the diff, which itself is obviously correctly formatted? It's the existing comment above that's wrong.

But then I run with -c main..f91855da again, which should be the equivalent of running with -c main right after commiting f91855da. But the check also comes up clean. I also tries -c main..HEAD~1 in case it doesn't understand the hash, and even -c HEAD~4..HEAD~1, but that also comes up clean.

As a final experiment, I insert a new commit after f91855da, inserting another badly formatted change:

Modified regular file samples/new_button/src/event.c:
    ...
   9    9: int a; /* This is a */
  10   10: int b_has_a_very_long_name; /* This is b*/
  11   11: 
       12:      int c;
       13: 
  12   14: /* Interrupt handler  */
  13   15: static void button_pressed(const struct device *dev, struct gpio_callback *cb, uint32_t pins)
  14   16: {
    ...

This becomes 7396a4ae. With my HEAD on this new commit, I run with -c main: It finds both formatting errors. If I run with -c HEAD~1..HEAD, it only shows the new error. That is correct behavior, apart from not telling me which commits are the problem(s).

I modify my final commit to fix the new error as well, leaving me with this log (the commit hash of change optwtlwm changed because it was both rebased and modified):

◆  uzzpzmxz [email protected] 2025-05-28 09:59:27 main next@origin 1f570f29
│  tests: Emulator based tests for sc18is604 I2C driver
○  nvtnlqtw [email protected] 2025-06-05 12:45:47 f1964c98
│  samples: New sample
○  rwwowvno [email protected] 2025-06-05 12:45:47 64ada3e2
│  Made a change
○  llnukymt [email protected] 2025-06-05 12:45:47 f91855da
│  Added b
○  mknrrlpt [email protected] 2025-06-05 13:27:12 07ff626e
│  Added c
○  optwtlwm [email protected] 2025-06-05 13:27:33 git_head() cdce64ce
│  fixed formatting
@  xlsqrvwp [email protected] 2025-06-05 13:27:34 4e3f4ec1
   (empty) (no description set)

I run the check on various commit ranges, but as long as HEAD is on a commit that itself is clean I can't get it to find the errors, no matter what I try.

At this point I'm just confused. The -c parameter clearly isn't ignored, if my HEAD is bad it determines if it only finds errors in the last commit or in previous commits as well. But it's also clearly not working correctly if HEAD isn't included in the range.

And as a final and maybe most confusing data point: If I put the HEAD on f91855da and run against f91855da..cdce64ce, i.e. from the introduction of the first error to the commit that fixes both errors, it finds the first error. But not the second. This suggests that either:

  • The commit range does include the first referent somehow (which means we shouldn't be running -c main, but start one commit later?)
  • The range somehow always includes HEAD and treats it specially.

Overall I'm now:

  • Convinced the check isn't applying its -c parameter correctly
  • Doubtful it's checking individual commits at all
  • Very confused

Originally posted by @Irockasingranite in https://github.com/tiacsys/bridle/issues/327#issuecomment-2943901527

rexut avatar Jun 06 '25 07:06 rexut

To be clear, I don't think this issue is specific to the ClangFormat check, that was simply the easiest to test. I rather suspect that most, if not all, checks in check_compliance.py aren't being run correctly against the specified commit ranges.

Looking at check_compliance.py confirms this—the passed commit range is being set as a global, but many checks only generate a diff for the entire range and run on that. Worse, some checks don't use the range at all and appear to operate on whatever is currently in the working copy. That would make those checks entirely wrong if HEAD isn't currently at the top of the commit range. If it is, they are only wonky in the same way as most other checks.

As for ClangFormat specifically, it seems to try to check the diff over the whole specified commit range and run it through clang-format-diff.py. That's not ideal for multiple reasons, but none of those reasons explain the inconsistency I saw during the tests chronicled above.

I'll try digging more deeply into that check in particular to figure out the strange behavior.

Irockasingranite avatar Jun 10 '25 07:06 Irockasingranite

Okay, it seems to be clang-format-diff.py that's doing unexpected things. Depending on the state of my working copy, it some does and sometimes doesn't see formatting errors.

From what I can glean without untangling the regexes used, it seems to:

  1. take in a diff
  2. parse the diff only insofar as to determine the set of changes lines
  3. takes the linenumbers of those lines
  4. runs clang-format on the files in the working copy, constrained to the line numbers determined from the diff

This does explains why I was seeing incorrect output depending on the state of my working copy.

I can see this behavior being useful if

  • Pull requests are always fully squashed before merging
  • And the check is only ever run in a controlled environment with a clean working copy on the final squashed commit

For workflows like ours (fast-forward merges only) we should either

  • Wrap the check in a way that calls it on individual checkouts of each commit in the given range, always using HEAD~1..HEAD as the diff.
  • Not rely on clang-format-diff.py at all at that point and instead just run clang-format on changed files for each commit.

Similar considerations should be made for each other check in the list.

Irockasingranite avatar Jun 10 '25 07:06 Irockasingranite