grass
grass copied to clipboard
CI: Run clang-format faster using a different action that runs in parallel
In the same goal of speeding up our CI, I'm suggesting here to use a different action to run clang-format, as there was no response (yet) from the currently used jidikula/clang-format-action https://github.com/jidicula/clang-format-action/issues/147#issuecomment-1817661437.
In summary, the current action calls to start a new Docker container for each file. It takes about 22 minutes to process our code base. By running by calling the binary with a list of files, and splitting the file list to run enough instances, https://github.com/DoozyX/clang-format-lint-action can process our code base in about 30 seconds.
The goal of having faster workflows, aven if they aren't our longest, is that since we have 20 runners available, shared across all OSGeo projects, when there is a lot of activity, we can start another job sooner.
I include here the analysis I wrote, after checking that out at the end of october.
If you want to see a potential use case, look at any clang-format run in the OSGeo/grass repo: https://github.com/OSGeo/grass/actions/workflows/clang-format-check.yml CI is often long before starting with the 20 available runners, since it is shared with other big projects in the geospatial field, like PROJ and GDAL.
In this repo, we have about 22 minutes to run clang-format. In order to check if the time spent was spent doing tasks, I used runforesight/workflow-telemetry-action@v1 to check CPU usage. I also compared to another action that uses a python runner instead (it's not simply python that should make the difference, but the way clang-format is called) : DoozyX/[email protected].
It is possible to collect all the files path/exclusions in a kind of list and then pass it all at once to clang-format. Even if it is a bit more complicated, super-linter (which uses shell scripts) uses this: https://github.com/super-linter/super-linter/blob/f3279a4414f050cf9dbb2b56db7eae31a9c8ee35/lib/functions/buildFileList.sh#L151-L193
Current action clang-format (~22 min for clang-format)
Workflow Telemetry - ClangFormat Check / Formatting Check
Workflow telemetry for commit 839680e69e88ad6bb79279adee67c38e6c371444 You can access workflow job details here
Step Trace
gantt title Formatting Check dateFormat x axisFormat %H:%M:%S Set up job : milestone, 1700341492000, 1700341493000 Collect Workflow Telemetry : 1700341493000, 1700341494000 Run actions/checkout@8ade135a41bc03ea155e62e844d188df1ea18608 : 1700341494000, 1700341500000 Run clang-format style check for C/C++/Protobuf programs. : 1700341500000, 1700342834000 Post Run actions/checkout@8ade135a41bc03ea155e62e844d188df1ea18608 : 1700342834000, 1700342834000CPU Metrics
Memory Metrics
IO Metrics
Read Write Network I/O Disk I/O Process Trace
Top 100 processes with highest duration
gantt title Formatting Check dateFormat x axisFormat %H:%M:%S node (checkout) : 1700341494031, 1700341500222 git : 1700341494293, 1700341499411 git : 1700341494296, 1700341499155 git-remote-http : 1700341494298, 1700341499155 git : 1700341497285, 1700341499399 git : 1700341499414, 1700341500206 bash : 1700341500280, 1700342834759 check.sh : 1700341500282, 1700342834758 bash : 1700341500282, 1700342834758 docker : 1700341500318, 1700341505529 docker : 1700341506736, 1700341507173 docker : 1700341576924, 1700341577368 docker : 1700341601245, 1700341601677 docker : 1700341612338, 1700341612780 docker : 1700341617968, 1700341618398 docker : 1700341635297, 1700341635728 docker : 1700341638114, 1700341638557 docker : 1700341640501, 1700341641130 docker : 1700341642316, 1700341643285 docker : 1700341644061, 1700341644492 docker : 1700341681129, 1700341681569 docker : 1700341683889, 1700341684334 docker : 1700341684741, 1700341685230 docker : 1700341685234, 1700341685668 docker : 1700341702724, 1700341703170 docker : 1700341704345, 1700341704833 docker : 1700341713141, 1700341713598 docker : 1700341717528, 1700341717961 docker : 1700341723568, 1700341724004 docker : 1700341734412, 1700341734893 docker : 1700341744392, 1700341744829 docker : 1700341777768, 1700341778206 docker : 1700341789957, 1700341790390 docker : 1700341798644, 1700341799110 docker : 1700341803069, 1700341803520 docker : 1700341803523, 1700341803966 docker : 1700341805161, 1700341805594 docker : 1700341808818, 1700341809284 docker : 1700341838615, 1700341839053 docker : 1700341846244, 1700341846676 docker : 1700341851378, 1700341851881 docker : 1700341852712, 1700341853145 docker : 1700341873992, 1700341874430 docker : 1700341890052, 1700341890497 docker : 1700341908661, 1700341909166 docker : 1700341911844, 1700341912285 docker : 1700341934051, 1700341934521 docker : 1700341978757, 1700341979198 docker : 1700342068588, 1700342069038 docker : 1700342103426, 1700342104179 docker : 1700342106153, 1700342106588 docker : 1700342118617, 1700342119047 docker : 1700342122929, 1700342123405 docker : 1700342127422, 1700342127852 docker : 1700342203474, 1700342203926 docker : 1700342207502, 1700342207933 docker : 1700342209092, 1700342209531 docker : 1700342214630, 1700342215061 docker : 1700342218602, 1700342219059 docker : 1700342241387, 1700342241858 docker : 1700342246286, 1700342246716 docker : 1700342247845, 1700342248278 docker : 1700342251824, 1700342252254 docker : 1700342275121, 1700342275558 docker : 1700342290457, 1700342291238 docker : 1700342304149, 1700342304598 docker : 1700342308937, 1700342309374 docker : 1700342343789, 1700342344230 docker : 1700342346998, 1700342347434 docker : 1700342382540, 1700342383105 docker : 1700342383109, 1700342383545 docker : 1700342419022, 1700342419459 docker : 1700342423378, 1700342423810 docker : 1700342428493, 1700342428925 docker : 1700342433989, 1700342434435 docker : 1700342439989, 1700342440430 docker : 1700342445189, 1700342445624 docker : 1700342450029, 1700342450476 docker : 1700342452423, 1700342453229 docker : 1700342453232, 1700342453665 docker : 1700342464009, 1700342464761 docker : 1700342478961, 1700342479394 docker : 1700342485376, 1700342485807 docker : 1700342519129, 1700342519570 docker : 1700342529498, 1700342529944 docker : 1700342538739, 1700342539181 docker : 1700342548254, 1700342548687 docker : 1700342585718, 1700342586154 docker : 1700342656476, 1700342656918 docker : 1700342691073, 1700342691504 docker : 1700342708154, 1700342708721 docker : 1700342757171, 1700342757653 docker : 1700342758051, 1700342758588 docker : 1700342775511, 1700342775955 docker : 1700342776336, 1700342776780 docker : 1700342782728, 1700342783159 docker : 1700342787481, 1700342787913 docker : 1700342789916, 1700342790370 docker : 1700342815887, 1700342816320 docker : 1700342823727, 1700342824163Faster clang-format (~31 secs for clang-format)
Workflow Telemetry - ClangFormat Check / Formatting Check
Workflow telemetry for commit 66f6736f1bb0aff522a5c6dd529775191c3ac7e5 You can access workflow job details here
Step Trace
gantt title Formatting Check dateFormat x axisFormat %H:%M:%S Set up job : milestone, 1700341522000, 1700341527000 Build DoozyX/[email protected] : 1700341527000, 1700341533000 Collect Workflow Telemetry : 1700341533000, 1700341533000 Run actions/checkout@8ade135a41bc03ea155e62e844d188df1ea18608 : 1700341533000, 1700341540000 Run DoozyX/[email protected] : 1700341540000, 1700341556000 Verify Changed files : 1700341556000, 1700341557000 List all changed files tracked and untracked files : 1700341557000, 1700341557000 Run actions/upload-artifact@v3 : 1700341557000, 1700341557000 Post Verify Changed files : 1700341557000, 1700341557000CPU Metrics
Memory Metrics
IO Metrics
Read Write Network I/O Disk I/O Process Trace
Top 100 processes with highest duration
gantt title Formatting Check dateFormat x axisFormat %H:%M:%S node (checkout) : 1700341533443, 1700341540649 git : 1700341533580, 1700341533581 git : 1700341533588, 1700341533589 git : 1700341533592, 1700341533598 git : 1700341533600, 1700341533601 git : 1700341533604, 1700341533605 git : crit, 1700341533607, 1700341533608 git : 1700341533611, 1700341533639 git-submodule : 1700341533613, 1700341533638 git : 1700341533616, 1700341533619 gettext : 1700341533623, 1700341533623 git : 1700341533627, 1700341533628 git : 1700341533630, 1700341533631 git : 1700341533631, 1700341533632 git : 1700341533633, 1700341533634 git : 1700341533634, 1700341533635 git : 1700341533638, 1700341533639 git : crit, 1700341533641, 1700341533642 git : 1700341533643, 1700341533664 git-submodule : 1700341533645, 1700341533665 git : 1700341533647, 1700341533647 gettext : 1700341533651, 1700341533651 git : 1700341533655, 1700341533656 git : 1700341533657, 1700341533658 git : 1700341533658, 1700341533659 git : 1700341533659, 1700341533660 git : 1700341533661, 1700341533662 git : 1700341533664, 1700341533665 git : 1700341533667, 1700341533668 git : 1700341533673, 1700341539839 git : 1700341533675, 1700341539613 git-remote-http : 1700341533677, 1700341539613 git : 1700341537275, 1700341539828 git : 1700341539829, 1700341539836 git : 1700341539839, 1700341539840 git : 1700341539843, 1700341540634 git : 1700341540636, 1700341540637 git : 1700341540640, 1700341540641 docker : 1700341540688, 1700341556577 bash : 1700341556637, 1700341557591 bash : 1700341556639, 1700341557591 git : 1700341556640, 1700341557085 git : 1700341557088, 1700341557564 git : 1700341557566, 1700341557581 sort : 1700341557585, 1700341557586 bash : 1700341557613, 1700341557614 node (upload-artifact) : 1700341557621, 1700341557902 node (checkout) : 1700341557919, 1700341558068 git : 1700341557992, 1700341557993 git : 1700341558000, 1700341558001 git : crit, 1700341558004, 1700341558005 git : 1700341558007, 1700341558031 git-submodule : 1700341558009, 1700341558031 git : 1700341558012, 1700341558012 gettext : 1700341558016, 1700341558016 git : 1700341558019, 1700341558020 git : 1700341558021, 1700341558022 git : 1700341558022, 1700341558023 git : 1700341558024, 1700341558025 git : 1700341558025, 1700341558026 git : 1700341558029, 1700341558031 git : 1700341558033, 1700341558034 git : 1700341558036, 1700341558037 git : 1700341558039, 1700341558062 git-submodule : 1700341558041, 1700341558062 git : 1700341558043, 1700341558043 gettext : 1700341558047, 1700341558047 git : 1700341558051, 1700341558052 git : 1700341558052, 1700341558053 git : 1700341558054, 1700341558055 git : 1700341558056, 1700341558057 git : 1700341558057, 1700341558058 git : 1700341558060, 1700341558062The most interesting thing to look at here is the difference between the CPU usage graphs:
Current action Other faster action Take note that the metrics were only fetched each 5 secs, so the one running only a minute is not as nice to see. So, even with a two core runner (that is what github gives us right?), about half of one core is used, with half of it being system time. So I think its fair to think starting a docker container for the 3357 files takes some amount of time.
I'm not knowledgable enough about bash to be able to suggest the syntax to place the results of the find into whatever variable it takes to give it to the command (inside docker). But it seems its possible to have an exclude regex in the find too, https://stackoverflow.com/questions/6745401/regular-expression-to-exclude-filetypes-from-find. So all could be made in one pass. Last thing: are they some limits on how long the file list can be to be passed sucessfully? If yes, then a batching of a couple thousands should be fine, since 3357 files divided in three "jobs" was able to be run. (The other one uses multiprocessing to launch more than one). Thus, I expect a runtime of maybe 1min30 or less to be perfectly possible if only one docker invocation is used with one process, and no other optimization.
Originally posted by @echoix in https://github.com/jidicula/clang-format-action/issues/147#issuecomment-1817661437
I had one that I triggered with extra white space and the logs clearly showed me the number of processes files and fixed files. I'll try to find them in my fork, but since it's been months it'll need a new run to see the results.
Edit: see here: https://github.com/echoix/grass/actions/runs/6662606466/job/18816176322
It's not the same branch, but the original testing was done there (other changes were explored in this draft to help profile).
I reread your comment, and I think that maybe you're asking that you want the action to fail completely when they are files that were reformatted.
I can add that, of course, I see that I was missing it. But another possibility would be to commit it, but as we saw, without a PAT actions won't rerun. Other possibility would be to add revision comments if there isn't a lot of changes to make (that could be accepted and committed from the UI). I've never done it yet, but it could be possible if that's the easiest way of using it.
If you're talking about the case where the code is so bad that even clang-format itself fails and can't format? In that case, if the action fails, well it fails, and we won't have a little extra help with the changed files ready to simply paste back and commit. Honestly, if the code was so bad it shouldn't be merged as is, and we hope other checks will fail too ;)
...I think that maybe you're asking that you want the action to fail completely when they are files that were reformatted.
Yes, that's what I expect and meant to say. If formatting is wrong, fail so that it is clear the PR should not be merged.
Perhaps we can do smarter things, like a commit or preferably a comment, but it seems to me that these always need to be in addition to a failure which tell you clear "no, don't merge this". Clang check is set to Required to ensure that things are not merged with bad formatting. The original action does fail with bad formatting (at least it did in the past). I of course don't particularly care which workflow will ensure that as long as some workflow will.
Ok, let me get back to it, but probably not tonight :)
We may also remove this workflow, replace it with the use of pre-commit.ci, see #3255, which not only addresses this but also a number of other issues.
Is that PR/configuration almost ready? If it is run on another service, who "administers" it, ie, can anyone in the repo help out to improve/fix config when there's problems or the person isn't there anymore?
I didn't get to work on it yet for round 2, thus the draft status. Christmas parties are started ;)
Is that PR/configuration almost ready?
The GRASS repo is ready now, it only remains to activate it. The advantage is that all pre-commit hooks defined in https://github.com/OSGeo/grass/blob/main/.pre-commit-config.yaml is run on a (one) runner image just as it would locally. In my quick test fork (linked in #3255) all was finished below 2 min (often closer to 1 min).
Current tests:
- trim trailing whitespace[^1]
- fix end of files[^1]
- markdownlint
- black[^1]
- flake8
- clang-format[^1]
- yamllint
[^1]: may autofix a pull request
There is loads of CI time to save and – even more importantly – avoids discrepancies. With pre-commit.ci we may remove (or initially just disable) the 'Python Code Quality' (~9 min), 'ClangFormat check' (now ~20 min, with this PR ~1 min), 'General linting' (~8 min), with the added bonus of auto fixes.
If it is run on another service, who "administers" it, ie, can anyone in the repo help out to improve/fix config when there's problems or the person isn't there anymore?
This workflow is activated (once) on repository admin level, but is hence directed by .pre-commit-config.yaml. The infrastructure is located at https://github.com/pre-commit-ci.
And would the checks be executed on forks, ie, would be able to run on a PR inside my own fork, to make sure everything is ready before submitting it upstream? (It's harder to investigate and prepare changes when "production" is different from "staging", where staging would be our own repos)
And would the checks be executed on forks, ie, would be able to run on a PR inside my own fork, to make sure everything is ready before submitting it upstream?
If you activate pre-commit.ci on your fork, probably could. Anyways the checks are not enabled on forks as it now stands anyway. That's why you should use pre-commit locally, not drain unnecessary CI resources.
There's been some progress on the original action: https://github.com/jidicula/clang-format-action/issues/147#issuecomment-1868563950 and https://github.com/jidicula/clang-format-action/pull/174
Still a draft. The permission issue was probably the fact that the workflow isn't on main yet. I committed a copy on my main branch, and ran a PR with an error, and the comment appeared correctly with the suggestion. However, I ended up with a limitation of GitHub, where the way the suggestion is made, it can't be applied on deleted lines.
I still have another trick up my sleeve.
Ready for a review. This will be working once it is not running from a fork. See my working side-PR here: https://github.com/echoix/grass/pull/62
I might need to rebase to clean up the intermediate commits needed. Are we allowed to create a branch in the main repo in order to confirm it works the same from that repo?
The code suggestions:
When there are some changes, covered by code suggestions:
When there are multiple files that need to be formatted, but none of them end up in the diff_context (changed lines, plus or minus 3 lines around) that allows to add a suggestion, I add the message that says to take the artifact archive and copy-paste into the repo to replace with the new files.
If any of you knows how to have an exit code as a bash variable/env-var, without actually having an exit code that would fail the step, I could remove completely one action. For now, I keep tj-actions/verify-changed-files only for the boolean output since I wasn't able to use the exit code from git diff --exit-code in any way.
If any of you knows how to have an exit code as a bash variable/env-var,
The exit code of the previous command is stored in $?.
- Success is 0.
- Fail is 1, example:
cat doesnotexist.txt
cat: doesnotexist.txt: No such file or directory
echo $?
1
...Are we allowed to create a branch in the main repo in order to confirm it works the same from that repo?
In this case, yes, there is no other way. Similar to what the bots are doing, although here it is for an experiment and not for an automated PR.
If any of you knows how to have an exit code as a bash variable/env-var,
The exit code of the previous command is stored in
$?.
Success is 0.
Fail is 1, example:
cat doesnotexist.txt cat: doesnotexist.txt: No such file or directory echo $? 1
I remember seeing $? somewhere. I just have to make sure that having that fist failing command doesn't stop the execution of the workflow, since a command "failed".
...Are we allowed to create a branch in the main repo in order to confirm it works the same from that repo?
In this case, yes, there is no other way. Similar to what the bots are doing, although here it is for an experiment and not for an automated PR.
I have another idea. Can we work and agree to end up approving and merging the PR, but without the part of posting comments really enabled, and then enable it in another PR, but with the workflow already in main with the correct permissions?
Or we might want to go with a pull_request_target event, and pulling the code from the exterior fork, and try to run the formatting. I just have to verify how to do it safely, as that's the point of pull_request_target.
Can we work and agree to end up approving and merging the PR, but without the part of posting comments really enabled...?
Sure, go ahead, I don't recall this being a requirement. I though the original motivation was speed and that's also specified in the PR title.
So as it is right now:
- If there is no clang-format issues, there is no error, even if it is from a fork like now.
- If there is some files changed by clang-format, the list is shown on the logs, and also outputted in the job summary (when we open the run details, underneath the graph and where the artifacts can be downloaded).
- Also if there is a formatting error (files changed), then extra work is done: an action will try to post code review comments with suggestions. The step is configured to exit with a failure when it posts comments.
- Comments on lines of code can only be added up to 3 lines around the lines changed in a PR. So this action filters out what it cannot post. If it doesn't post comments, it doesn't fail with an error.
- If the workflow didn't fail at posting suggestions, and there was some files changed, then an extra step is ran to add more info in the job summary that says that all these files should be changed, and then exit with error.
- Since the files were already changed by the workflow, a step uploading them as an artifact is ran even if there is an error, in order to just apply them manually.
The only uncertainty left now, is that is the failure to post code review comments (95% sure due to permissions), is due to the fact that the action in the main branch didn't use the token with these permissions, or its because it is a PR from a fork? In both cases, the step that would fail fails, but for another reason. I suggest we try it out, as the worst is that the best-effort to simplify our approval of PRs didn't work, and is the same as not having it.
I also added a workflow_dispatch trigger as an escape for us: since when we (with access to the repo) run it manually, the actor is us and should work for a PR (if we are able to select the correct target), just a bit more work.
So I suggest that we merge it as is for now, in the seconds is way better than 20 mins.
A little cleanup of the squash message should be done for this.
I cleaned up the squash message in the auto merge
I'm eager to see if the action suggester works correctly once merged. If so, there's a lot of opportunities to use them in our CI, black for example, and even shellcheck+shfmt
I'll go and implement it for the grass-addons two, and merge it myself there, so it'll be already tested out in a more "staging"-like repo, since we seem a bit more precautious here.
The maximum token permissions for a PR coming from a fork is listed in the last column of the table here: https://docs.github.com/en/actions/security-guides/automatic-token-authentication#permissions-for-the-github_token
I'm working since this morning on a solution, after reading a lot. Even if technically, using pull_request_target and pulling the PR's branch could be argued safe here since we are only linting, it's not that clear that it will always be safe. I'm working on a more complete solution.
Yesterday night I managed to get something working as I'd like. I'll be cleaning this soon, and I made myself an organization to have an external fork to create a pull request against when testing, and this time it works.