llvm-project
llvm-project copied to clipboard
clang-format 14 is extremely slow
Formatting https://github.com/OpenMPT/openmpt/blob/072d4ccc83d842cfa34601dddf2c69a4cc889c74/src/mpt/arch/x86_amd64.hpp (~50kB) using https://github.com/OpenMPT/openmpt/blob/072d4ccc83d842cfa34601dddf2c69a4cc889c74/src/mpt/.clang-format with clang-format 14 takes about 14 minutes for me (CPU is a 3.4GHz AMD Piledriver).
I am using clang-format version 14.0.5 (as shipped with Visual Studio 2022 17.3.0) or Ubuntu clang-format version 14.0.6-++20220622053131+f28c006a5895-1~exp1~20220622173215.157 (from deb http://apt.llvm.org/focal/ llvm-toolchain-focal-14 main).
I do not see any excessive memory consumption, it's only CPU time.
The performance was fine with clang-format 13.
I could also provide other source files where performance is noticeably slow with clang-format 14 if required, however the given file is the worst offender.
I can "confirm" this problem. I just upgrade Qt Creator from 6.0.0 to 8.0.1, and suddenly it ate up all my CPUs - 99,7 % load when idle... Disabling the ClangFormat plugin dropped the CPU load to less than 1 % when idle.
Maybe related to https://bugs.llvm.org/show_bug.cgi?id=49519
@llvm/issue-subscribers-clang-format
@manxorist Could you please run this with QualifierAlignment set to Leave instead of Custom and remove QualifierOrder in your .clang-format?
manx@quadratus:~/projects/openmpt/trunk-1$ time clang-format-14 -i src/mpt/arch/x86_amd64.hpp
real 0m2.576s
user 0m2.555s
sys 0m0.020s
@codido So yes, changing QualifierAlignment to Leave makes it run in seconds and avoids the performance problem.
I'll probably use that as a work-around for now.
Qualifier Ordering along with a file with lots of #ifdef will likely be the combined cause
@brad0 since you added it to the 15.x milestone and it seems to be pretty high impact issue - I will move this from triage to needs fix. Next rc is released the 23rd of august if you want to get a fix in that one.
@mydeveloperday is anyone looking for a fix for this issue for 15.x?
@tru Sorry I did not see your response. I don't have a fix at the moment. I just added the ticket as I saw this referenced in another community and it seemed like a regression that would be good to fix for 15.x.
15.0.1 window is soon closing - any updates on a fix here?
Moving this to 15.0.2
Removing this from 15.x - let me know if there is a fix we can merge.
I also noticed this slow down. I do not think that it is necessarily or exclusively related to preprocessor macros such as #ifdef. I have a cpp file with ~13000 lines (650KB) without any preprocess macros (asides from includes). Formatting some small selection (~5 lines) from within Visual Studio 2022 17.4.0 with clang-format 14 or 15 without QualifierAlignment runs in approx. half a second (as it used to), while with QualifierAlignment: Custom and QualifierOrder: ['inline', 'static', 'constexpr', 'type', 'const', 'volatile' ] in the configuration file takes approx. 2 seconds. So 4x longer. A QualiferAlignment of Left or Right is similarly slower.
I did a bit of profiling myself. From what I could understand:
-
I noticed that even without any preprocessor directives in the source code, using anything but
QualifierAlignment: Leaveslows down clang-format noticeably. For example, take some file with like 20000 lines. WithLeavethe formatting is almost instantaneous, while any otherQualifierAlignmentsetting takes a considerable amount of time (seconds). The more elements are inQualifierOrder, the longer it takes. Apparently, theQualifierAlignmentFixergoes over the whole file once for every single element inQualifierOrder(see the loop overPasses). So the settingQualifierAlignment: Custom QualifierOrder: ['inline', 'static', 'constexpr', 'restrict', 'type', 'const', 'volatile' ](I think 6 passes) is MUCH slower than just
QualifierAlignment: Custom QualifierOrder: ['type', 'const' ](1 pass). So if you only really care about
const, use that setting. It is still notably slower thanQualifierAlignment: Leave, but maybe in some cases fast enough. ForQualifierAlignment: LeftandQualifierAlignment: RighttheQualifierOrdercontains 2 elements (constandvolatile). -
The problem multiplies in presence of nested preprocessor directives. As far as I understand, clang-format does some macro evaluation. In the example of the OP, clang-format seems to generate 289 "variations" of the input file. (The
UnwrappedLinescontains 289 elements, and each element seems to represent a version of the input file.) In other words, clang-format creates 289 copies of the file, and each time theQualifierAlignmentFixerthen goes another N times over the whole file, where N is the number of elements inQualifierOrder. So this scales very poorly, leading to the extreme run times observed by the OP.
I did some more digging and I think I understand the problem better and came up with a potential patch. My previous analysis was not complete.
In clang::format::internal::reformat() the code iterates over all individual passes (where a pass is a "formatting job"), one of which is QualifierAlignmentFixer(...).process(). This process() call is actually calling the generic implementation TokenAnalyzer::process(). TokenAnalyzer::process() iterates over UnwrappedLines, which in the example of the OP contains 289 entries. Each entry represents 1 variant of the input file, generated by evaluating preprocessor directives differently. For each entry, it calls QualifierAlignmentFixer::analyze() which does a loop over a fixer for every qualifier (in the code also called "Passes"). It calls LeftRightQualifierAlignmentFixer(...).process() each time, which again is the generic implementation in TokenAnalyzer::process(). There it again iterates over all 289 entries in UnwrappedLines, and calls LeftRightQualifierAlignmentFixer::analyze() for each.
So, boiled down, the structure looks like this:
for each of the 289 versions of the file in UnwrappedLines:
for each configured qualifier:
for each of the 289 versions of the file in UnwrappedLines:
LeftRightQualifierAlignmentFixer::analyze()
This obviously scales quadratic with the number of preprocessor branches, causing the terrible performance. Generally speaking, it seems to be a bad idea to use a TokenAnalyzer within another TokenAnalyzer since this will always cause this double loop over UnwrappedLines.
I currently do not really see a good reason for the inner loop over the UnwrappedLines. So I attempted to fix the performance issue and created a potential patch Patch_57117.patch. It does the following:
- It gets rid of the the double loop over
UnwrappedLinesby adding the passes for the individual qualifiers (i.e. the calls toLeftRightQualifierAlignmentFixer(...).process()) to the list of passes inclang::format::internal::reformat()directly, instead of using an intermediateQualifierAlignmentFixer. Hence,QualifierAlignmentFixerbecame obsolete and is removed. - This change bypasses also the code in
QualifierAlignmentFixer::analyze(). However, the code seems to have been copy & paste fromclang::format::internal::reformat()anyway, except the part at the end where it filters out replacements that do nothing ("NoOpFixes"). I originally tried to move that code to the individual qualifier analysis (LeftRightQualifierAlignmentFixer::analyze()), but this caused many test failures because the NoOpFixes-code relies on the fixes for all qualifiers to have been merged. So instead, in the patch I moved the code to the top level functionclang::format::internal::reformat(). Of course, this then potentially affects the output of clang-format always, i.e. for all passes, not just for the qualifier fixer. I am really unsure of the implications, but maybe it is a good idea to filter replacements that do nothing always? In any case, all tests pass. - Regarding the "NoOpFixes" code, the original commit that introduced it also added a test (
QualifierFixerTest.NoOpQualifierReplacements). Unfortunately, the test got broken at some point since it no longer tests theReplacementCountof the firstverifyFormat. So my patch fixes this in order to test the effect of the "NoOpFixes" code again.
Formatting the x86_amd64.hpp with the given .clang-format file of the OP:
- Without the patch: 320s
- With the patch: 3s
- For comparison, with
QualifierAlignment: Leave: 1.1s
So an increase of performance by two orders of magnitude 😃 The resulting files are identical.
Any opinions on the patch from someone with any clang-format experience, maybe @mydeveloperday? Especially regarding the NoOpFixes code that I moved to clang::format::internal::reformat()? Or should I just submit it for review?
I just submitted a fix for review: https://reviews.llvm.org/D153228
Or should I just submit it for review?
The answer is always: yes. :) I for one don't look here at github (sorry, no time), but reviews I look at all (clang-format related), most of the time I also post something.