grass CI: Run clang-format faster using a different action that runs in parallel

In the same goal of speeding up our CI, I'm suggesting here to use a different action to run clang-format, as there was no response (yet) from the currently used jidikula/clang-format-action https://github.com/jidicula/clang-format-action/issues/147#issuecomment-1817661437.

In summary, the current action calls to start a new Docker container for each file. It takes about 22 minutes to process our code base. By running by calling the binary with a list of files, and splitting the file list to run enough instances, https://github.com/DoozyX/clang-format-lint-action can process our code base in about 30 seconds.

The goal of having faster workflows, aven if they aren't our longest, is that since we have 20 runners available, shared across all OSGeo projects, when there is a lot of activity, we can start another job sooner.

I include here the analysis I wrote, after checking that out at the end of october.

If you want to see a potential use case, look at any clang-format run in the OSGeo/grass repo: https://github.com/OSGeo/grass/actions/workflows/clang-format-check.yml CI is often long before starting with the 20 available runners, since it is shared with other big projects in the geospatial field, like PROJ and GDAL.

In this repo, we have about 22 minutes to run clang-format. In order to check if the time spent was spent doing tasks, I used runforesight/workflow-telemetry-action@v1 to check CPU usage. I also compared to another action that uses a python runner instead (it's not simply python that should make the difference, but the way clang-format is called) : DoozyX/[email protected].

It is possible to collect all the files path/exclusions in a kind of list and then pass it all at once to clang-format. Even if it is a bit more complicated, super-linter (which uses shell scripts) uses this: https://github.com/super-linter/super-linter/blob/f3279a4414f050cf9dbb2b56db7eae31a9c8ee35/lib/functions/buildFileList.sh#L151-L193

Current action clang-format (~22 min for clang-format)

Workflow Telemetry - ClangFormat Check / Formatting Check

Workflow telemetry for commit 839680e69e88ad6bb79279adee67c38e6c371444 You can access workflow job details here

Step Trace

gantt
	title Formatting Check
	dateFormat x
	axisFormat %H:%M:%S
	Set up job : milestone, 1700341492000, 1700341493000
	Collect Workflow Telemetry : 1700341493000, 1700341494000
	Run actions/checkout@8ade135a41bc03ea155e62e844d188df1ea18608 : 1700341494000, 1700341500000
	Run clang-format style check for C/C++/Protobuf programs. : 1700341500000, 1700342834000
	Post Run actions/checkout@8ade135a41bc03ea155e62e844d188df1ea18608 : 1700342834000, 1700342834000

CPU Metrics

68747470733a2f2f6170692e676c6f62616467652e636f6d2f76312f636861727467656e2f737461636b65642d617265612f74696d652f63686172745f737461636b65645f617265615f74696d655f65326434313138342d363631332d343433342d613235612d3435373863

Memory Metrics

68747470733a2f2f6170692e676c6f62616467652e636f6d2f76312f636861727467656e2f737461636b65642d617265612f74696d652f63686172745f737461636b65645f617265615f74696d655f66373163633462642d653762382d343530642d623836332d3531396264

IO Metrics

	Read	Write
Network I/O
Disk I/O

Process Trace

Top 100 processes with highest duration

gantt
	title Formatting Check
	dateFormat x
	axisFormat %H:%M:%S
	node (checkout) : 1700341494031, 1700341500222
	git : 1700341494293, 1700341499411
	git : 1700341494296, 1700341499155
	git-remote-http : 1700341494298, 1700341499155
	git : 1700341497285, 1700341499399
	git : 1700341499414, 1700341500206
	bash : 1700341500280, 1700342834759
	check.sh : 1700341500282, 1700342834758
	bash : 1700341500282, 1700342834758
	docker : 1700341500318, 1700341505529
	docker : 1700341506736, 1700341507173
	docker : 1700341576924, 1700341577368
	docker : 1700341601245, 1700341601677
	docker : 1700341612338, 1700341612780
	docker : 1700341617968, 1700341618398
	docker : 1700341635297, 1700341635728
	docker : 1700341638114, 1700341638557
	docker : 1700341640501, 1700341641130
	docker : 1700341642316, 1700341643285
	docker : 1700341644061, 1700341644492
	docker : 1700341681129, 1700341681569
	docker : 1700341683889, 1700341684334
	docker : 1700341684741, 1700341685230
	docker : 1700341685234, 1700341685668
	docker : 1700341702724, 1700341703170
	docker : 1700341704345, 1700341704833
	docker : 1700341713141, 1700341713598
	docker : 1700341717528, 1700341717961
	docker : 1700341723568, 1700341724004
	docker : 1700341734412, 1700341734893
	docker : 1700341744392, 1700341744829
	docker : 1700341777768, 1700341778206
	docker : 1700341789957, 1700341790390
	docker : 1700341798644, 1700341799110
	docker : 1700341803069, 1700341803520
	docker : 1700341803523, 1700341803966
	docker : 1700341805161, 1700341805594
	docker : 1700341808818, 1700341809284
	docker : 1700341838615, 1700341839053
	docker : 1700341846244, 1700341846676
	docker : 1700341851378, 1700341851881
	docker : 1700341852712, 1700341853145
	docker : 1700341873992, 1700341874430
	docker : 1700341890052, 1700341890497
	docker : 1700341908661, 1700341909166
	docker : 1700341911844, 1700341912285
	docker : 1700341934051, 1700341934521
	docker : 1700341978757, 1700341979198
	docker : 1700342068588, 1700342069038
	docker : 1700342103426, 1700342104179
	docker : 1700342106153, 1700342106588
	docker : 1700342118617, 1700342119047
	docker : 1700342122929, 1700342123405
	docker : 1700342127422, 1700342127852
	docker : 1700342203474, 1700342203926
	docker : 1700342207502, 1700342207933
	docker : 1700342209092, 1700342209531
	docker : 1700342214630, 1700342215061
	docker : 1700342218602, 1700342219059
	docker : 1700342241387, 1700342241858
	docker : 1700342246286, 1700342246716
	docker : 1700342247845, 1700342248278
	docker : 1700342251824, 1700342252254
	docker : 1700342275121, 1700342275558
	docker : 1700342290457, 1700342291238
	docker : 1700342304149, 1700342304598
	docker : 1700342308937, 1700342309374
	docker : 1700342343789, 1700342344230
	docker : 1700342346998, 1700342347434
	docker : 1700342382540, 1700342383105
	docker : 1700342383109, 1700342383545
	docker : 1700342419022, 1700342419459
	docker : 1700342423378, 1700342423810
	docker : 1700342428493, 1700342428925
	docker : 1700342433989, 1700342434435
	docker : 1700342439989, 1700342440430
	docker : 1700342445189, 1700342445624
	docker : 1700342450029, 1700342450476
	docker : 1700342452423, 1700342453229
	docker : 1700342453232, 1700342453665
	docker : 1700342464009, 1700342464761
	docker : 1700342478961, 1700342479394
	docker : 1700342485376, 1700342485807
	docker : 1700342519129, 1700342519570
	docker : 1700342529498, 1700342529944
	docker : 1700342538739, 1700342539181
	docker : 1700342548254, 1700342548687
	docker : 1700342585718, 1700342586154
	docker : 1700342656476, 1700342656918
	docker : 1700342691073, 1700342691504
	docker : 1700342708154, 1700342708721
	docker : 1700342757171, 1700342757653
	docker : 1700342758051, 1700342758588
	docker : 1700342775511, 1700342775955
	docker : 1700342776336, 1700342776780
	docker : 1700342782728, 1700342783159
	docker : 1700342787481, 1700342787913
	docker : 1700342789916, 1700342790370
	docker : 1700342815887, 1700342816320
	docker : 1700342823727, 1700342824163

Faster clang-format (~31 secs for clang-format)

Workflow Telemetry - ClangFormat Check / Formatting Check

Workflow telemetry for commit 66f6736f1bb0aff522a5c6dd529775191c3ac7e5 You can access workflow job details here

Step Trace

gantt
	title Formatting Check
	dateFormat x
	axisFormat %H:%M:%S
	Set up job : milestone, 1700341522000, 1700341527000
	Build DoozyX/[email protected] : 1700341527000, 1700341533000
	Collect Workflow Telemetry : 1700341533000, 1700341533000
	Run actions/checkout@8ade135a41bc03ea155e62e844d188df1ea18608 : 1700341533000, 1700341540000
	Run DoozyX/[email protected] : 1700341540000, 1700341556000
	Verify Changed files : 1700341556000, 1700341557000
	List all changed files tracked and untracked files : 1700341557000, 1700341557000
	Run actions/upload-artifact@v3 : 1700341557000, 1700341557000
	Post Verify Changed files : 1700341557000, 1700341557000

CPU Metrics

68747470733a2f2f6170692e676c6f62616467652e636f6d2f76312f636861727467656e2f737461636b65642d617265612f74696d652f63686172745f737461636b65645f617265615f74696d655f30376438313836632d626635632d346462652d613034372d6537663033

Memory Metrics

68747470733a2f2f6170692e676c6f62616467652e636f6d2f76312f636861727467656e2f737461636b65642d617265612f74696d652f63686172745f737461636b65645f617265615f74696d655f33373165316236652d636130652d343064342d383033392d3065326162

IO Metrics

	Read	Write
Network I/O
Disk I/O

Process Trace

Top 100 processes with highest duration

gantt
	title Formatting Check
	dateFormat x
	axisFormat %H:%M:%S
	node (checkout) : 1700341533443, 1700341540649
	git : 1700341533580, 1700341533581
	git : 1700341533588, 1700341533589
	git : 1700341533592, 1700341533598
	git : 1700341533600, 1700341533601
	git : 1700341533604, 1700341533605
	git : crit, 1700341533607, 1700341533608
	git : 1700341533611, 1700341533639
	git-submodule : 1700341533613, 1700341533638
	git : 1700341533616, 1700341533619
	gettext : 1700341533623, 1700341533623
	git : 1700341533627, 1700341533628
	git : 1700341533630, 1700341533631
	git : 1700341533631, 1700341533632
	git : 1700341533633, 1700341533634
	git : 1700341533634, 1700341533635
	git : 1700341533638, 1700341533639
	git : crit, 1700341533641, 1700341533642
	git : 1700341533643, 1700341533664
	git-submodule : 1700341533645, 1700341533665
	git : 1700341533647, 1700341533647
	gettext : 1700341533651, 1700341533651
	git : 1700341533655, 1700341533656
	git : 1700341533657, 1700341533658
	git : 1700341533658, 1700341533659
	git : 1700341533659, 1700341533660
	git : 1700341533661, 1700341533662
	git : 1700341533664, 1700341533665
	git : 1700341533667, 1700341533668
	git : 1700341533673, 1700341539839
	git : 1700341533675, 1700341539613
	git-remote-http : 1700341533677, 1700341539613
	git : 1700341537275, 1700341539828
	git : 1700341539829, 1700341539836
	git : 1700341539839, 1700341539840
	git : 1700341539843, 1700341540634
	git : 1700341540636, 1700341540637
	git : 1700341540640, 1700341540641
	docker : 1700341540688, 1700341556577
	bash : 1700341556637, 1700341557591
	bash : 1700341556639, 1700341557591
	git : 1700341556640, 1700341557085
	git : 1700341557088, 1700341557564
	git : 1700341557566, 1700341557581
	sort : 1700341557585, 1700341557586
	bash : 1700341557613, 1700341557614
	node (upload-artifact) : 1700341557621, 1700341557902
	node (checkout) : 1700341557919, 1700341558068
	git : 1700341557992, 1700341557993
	git : 1700341558000, 1700341558001
	git : crit, 1700341558004, 1700341558005
	git : 1700341558007, 1700341558031
	git-submodule : 1700341558009, 1700341558031
	git : 1700341558012, 1700341558012
	gettext : 1700341558016, 1700341558016
	git : 1700341558019, 1700341558020
	git : 1700341558021, 1700341558022
	git : 1700341558022, 1700341558023
	git : 1700341558024, 1700341558025
	git : 1700341558025, 1700341558026
	git : 1700341558029, 1700341558031
	git : 1700341558033, 1700341558034
	git : 1700341558036, 1700341558037
	git : 1700341558039, 1700341558062
	git-submodule : 1700341558041, 1700341558062
	git : 1700341558043, 1700341558043
	gettext : 1700341558047, 1700341558047
	git : 1700341558051, 1700341558052
	git : 1700341558052, 1700341558053
	git : 1700341558054, 1700341558055
	git : 1700341558056, 1700341558057
	git : 1700341558057, 1700341558058
	git : 1700341558060, 1700341558062

The most interesting thing to look at here is the difference between the CPU usage graphs:

Current action	Other faster action

Take note that the metrics were only fetched each 5 secs, so the one running only a minute is not as nice to see. So, even with a two core runner (that is what github gives us right?), about half of one core is used, with half of it being system time. So I think its fair to think starting a docker container for the 3357 files takes some amount of time.

I'm not knowledgable enough about bash to be able to suggest the syntax to place the results of the find into whatever variable it takes to give it to the command (inside docker). But it seems its possible to have an exclude regex in the find too, https://stackoverflow.com/questions/6745401/regular-expression-to-exclude-filetypes-from-find. So all could be made in one pass. Last thing: are they some limits on how long the file list can be to be passed sucessfully? If yes, then a batching of a couple thousands should be fine, since 3357 files divided in three "jobs" was able to be run. (The other one uses multiprocessing to launch more than one). Thus, I expect a runtime of maybe 1min30 or less to be perfectly possible if only one docker invocation is used with one process, and no other optimization.

Originally posted by @echoix in https://github.com/jidicula/clang-format-action/issues/147#issuecomment-1817661437

Dec 03 '23 18:12 echoix

I had one that I triggered with extra white space and the logs clearly showed me the number of processes files and fixed files. I'll try to find them in my fork, but since it's been months it'll need a new run to see the results.

Edit: see here: https://github.com/echoix/grass/actions/runs/6662606466/job/18816176322

It's not the same branch, but the original testing was done there (other changes were explored in this draft to help profile).

Dec 16 '23 02:12 echoix

I reread your comment, and I think that maybe you're asking that you want the action to fail completely when they are files that were reformatted.

I can add that, of course, I see that I was missing it. But another possibility would be to commit it, but as we saw, without a PAT actions won't rerun. Other possibility would be to add revision comments if there isn't a lot of changes to make (that could be accepted and committed from the UI). I've never done it yet, but it could be possible if that's the easiest way of using it.

If you're talking about the case where the code is so bad that even clang-format itself fails and can't format? In that case, if the action fails, well it fails, and we won't have a little extra help with the changed files ready to simply paste back and commit. Honestly, if the code was so bad it shouldn't be merged as is, and we hope other checks will fail too ;)

Dec 16 '23 02:12 echoix

...I think that maybe you're asking that you want the action to fail completely when they are files that were reformatted.

Yes, that's what I expect and meant to say. If formatting is wrong, fail so that it is clear the PR should not be merged.

Perhaps we can do smarter things, like a commit or preferably a comment, but it seems to me that these always need to be in addition to a failure which tell you clear "no, don't merge this". Clang check is set to Required to ensure that things are not merged with bad formatting. The original action does fail with bad formatting (at least it did in the past). I of course don't particularly care which workflow will ensure that as long as some workflow will.

Dec 16 '23 02:12 wenzeslaus

Ok, let me get back to it, but probably not tonight :)

Dec 16 '23 03:12 echoix

We may also remove this workflow, replace it with the use of pre-commit.ci, see #3255, which not only addresses this but also a number of other issues.

Dec 17 '23 21:12 nilason

Is that PR/configuration almost ready? If it is run on another service, who "administers" it, ie, can anyone in the repo help out to improve/fix config when there's problems or the person isn't there anymore?

Dec 18 '23 12:12 echoix

I didn't get to work on it yet for round 2, thus the draft status. Christmas parties are started ;)

Dec 18 '23 12:12 echoix

Is that PR/configuration almost ready?

The GRASS repo is ready now, it only remains to activate it. The advantage is that all pre-commit hooks defined in https://github.com/OSGeo/grass/blob/main/.pre-commit-config.yaml is run on a (one) runner image just as it would locally. In my quick test fork (linked in #3255) all was finished below 2 min (often closer to 1 min).

Current tests:

trim trailing whitespace[^1]
fix end of files[^1]
markdownlint
black[^1]
flake8
clang-format[^1]
yamllint

[^1]: may autofix a pull request

There is loads of CI time to save and – even more importantly – avoids discrepancies. With pre-commit.ci we may remove (or initially just disable) the 'Python Code Quality' (~9 min), 'ClangFormat check' (now ~20 min, with this PR ~1 min), 'General linting' (~8 min), with the added bonus of auto fixes.

If it is run on another service, who "administers" it, ie, can anyone in the repo help out to improve/fix config when there's problems or the person isn't there anymore?

This workflow is activated (once) on repository admin level, but is hence directed by .pre-commit-config.yaml. The infrastructure is located at https://github.com/pre-commit-ci.

Dec 18 '23 14:12 nilason

And would the checks be executed on forks, ie, would be able to run on a PR inside my own fork, to make sure everything is ready before submitting it upstream? (It's harder to investigate and prepare changes when "production" is different from "staging", where staging would be our own repos)

Dec 18 '23 17:12 echoix

And would the checks be executed on forks, ie, would be able to run on a PR inside my own fork, to make sure everything is ready before submitting it upstream?

If you activate pre-commit.ci on your fork, probably could. Anyways the checks are not enabled on forks as it now stands anyway. That's why you should use pre-commit locally, not drain unnecessary CI resources.

Dec 18 '23 17:12 nilason

There's been some progress on the original action: https://github.com/jidicula/clang-format-action/issues/147#issuecomment-1868563950 and https://github.com/jidicula/clang-format-action/pull/174

Dec 27 '23 01:12 echoix

Still a draft. The permission issue was probably the fact that the workflow isn't on main yet. I committed a copy on my main branch, and ran a PR with an error, and the comment appeared correctly with the suggestion. However, I ended up with a limitation of GitHub, where the way the suggestion is made, it can't be applied on deleted lines.

I still have another trick up my sleeve.

Mar 06 '24 03:03 echoix

Ready for a review. This will be working once it is not running from a fork. See my working side-PR here: https://github.com/echoix/grass/pull/62

I might need to rebase to clean up the intermediate commits needed. Are we allowed to create a branch in the main repo in order to confirm it works the same from that repo?

The code suggestions:

When there are some changes, covered by code suggestions:

When there are multiple files that need to be formatted, but none of them end up in the diff_context (changed lines, plus or minus 3 lines around) that allows to add a suggestion, I add the message that says to take the artifact archive and copy-paste into the repo to replace with the new files.

If any of you knows how to have an exit code as a bash variable/env-var, without actually having an exit code that would fail the step, I could remove completely one action. For now, I keep tj-actions/verify-changed-files only for the boolean output since I wasn't able to use the exit code from git diff --exit-code in any way.

Mar 06 '24 06:03 echoix

If any of you knows how to have an exit code as a bash variable/env-var,

The exit code of the previous command is stored in $?.

Success is 0.
Fail is 1, example:

cat doesnotexist.txt
cat: doesnotexist.txt: No such file or directory
echo $?
1

Mar 06 '24 10:03 neteler

...Are we allowed to create a branch in the main repo in order to confirm it works the same from that repo?

In this case, yes, there is no other way. Similar to what the bots are doing, although here it is for an experiment and not for an automated PR.

Mar 06 '24 11:03 wenzeslaus

If any of you knows how to have an exit code as a bash variable/env-var,

The exit code of the previous command is stored in $?.

Success is 0.

Fail is 1, example:
cat doesnotexist.txt

cat: doesnotexist.txt: No such file or directory

echo $?

1

I remember seeing $? somewhere. I just have to make sure that having that fist failing command doesn't stop the execution of the workflow, since a command "failed".

Mar 06 '24 12:03 echoix

...Are we allowed to create a branch in the main repo in order to confirm it works the same from that repo?

In this case, yes, there is no other way. Similar to what the bots are doing, although here it is for an experiment and not for an automated PR.

I have another idea. Can we work and agree to end up approving and merging the PR, but without the part of posting comments really enabled, and then enable it in another PR, but with the workflow already in main with the correct permissions?

Or we might want to go with a pull_request_target event, and pulling the code from the exterior fork, and try to run the formatting. I just have to verify how to do it safely, as that's the point of pull_request_target.

Mar 06 '24 12:03 echoix

Can we work and agree to end up approving and merging the PR, but without the part of posting comments really enabled...?

Sure, go ahead, I don't recall this being a requirement. I though the original motivation was speed and that's also specified in the PR title.

Mar 06 '24 20:03 wenzeslaus

So as it is right now:

If there is no clang-format issues, there is no error, even if it is from a fork like now.
If there is some files changed by clang-format, the list is shown on the logs, and also outputted in the job summary (when we open the run details, underneath the graph and where the artifacts can be downloaded).
Also if there is a formatting error (files changed), then extra work is done: an action will try to post code review comments with suggestions. The step is configured to exit with a failure when it posts comments.
Comments on lines of code can only be added up to 3 lines around the lines changed in a PR. So this action filters out what it cannot post. If it doesn't post comments, it doesn't fail with an error.
If the workflow didn't fail at posting suggestions, and there was some files changed, then an extra step is ran to add more info in the job summary that says that all these files should be changed, and then exit with error.
Since the files were already changed by the workflow, a step uploading them as an artifact is ran even if there is an error, in order to just apply them manually.

The only uncertainty left now, is that is the failure to post code review comments (95% sure due to permissions), is due to the fact that the action in the main branch didn't use the token with these permissions, or its because it is a PR from a fork? In both cases, the step that would fail fails, but for another reason. I suggest we try it out, as the worst is that the best-effort to simplify our approval of PRs didn't work, and is the same as not having it.

I also added a workflow_dispatch trigger as an escape for us: since when we (with access to the repo) run it manually, the actor is us and should work for a PR (if we are able to select the correct target), just a bit more work.

So I suggest that we merge it as is for now, in the seconds is way better than 20 mins.

A little cleanup of the squash message should be done for this.

Mar 06 '24 21:03 echoix

I cleaned up the squash message in the auto merge

Mar 06 '24 21:03 echoix

I'm eager to see if the action suggester works correctly once merged. If so, there's a lot of opportunities to use them in our CI, black for example, and even shellcheck+shfmt

Mar 06 '24 22:03 echoix

I'll go and implement it for the grass-addons two, and merge it myself there, so it'll be already tested out in a more "staging"-like repo, since we seem a bit more precautious here.

Mar 09 '24 12:03 echoix

The maximum token permissions for a PR coming from a fork is listed in the last column of the table here: https://docs.github.com/en/actions/security-guides/automatic-token-authentication#permissions-for-the-github_token

I'm working since this morning on a solution, after reading a lot. Even if technically, using pull_request_target and pulling the PR's branch could be argued safe here since we are only linting, it's not that clear that it will always be safe. I'm working on a more complete solution.

Mar 09 '24 18:03 echoix

Yesterday night I managed to get something working as I'd like. I'll be cleaning this soon, and I made myself an organization to have an external fork to create a pull request against when testing, and this time it works.

Mar 10 '24 12:03 echoix

grass grass copied to clipboard

CI: Run clang-format faster using a different action that runs in parallel

Workflow Telemetry - ClangFormat Check / Formatting Check

Step Trace

CPU Metrics

Memory Metrics

IO Metrics

Process Trace

Top 100 processes with highest duration

Workflow Telemetry - ClangFormat Check / Formatting Check

Step Trace

CPU Metrics

Memory Metrics

IO Metrics

Process Trace

Top 100 processes with highest duration

grass
grass copied to clipboard