llvm-project icon indicating copy to clipboard operation
llvm-project copied to clipboard

clang-format 14 is extremely slow

Open manxorist opened this issue 3 years ago • 6 comments

Formatting https://github.com/OpenMPT/openmpt/blob/072d4ccc83d842cfa34601dddf2c69a4cc889c74/src/mpt/arch/x86_amd64.hpp (~50kB) using https://github.com/OpenMPT/openmpt/blob/072d4ccc83d842cfa34601dddf2c69a4cc889c74/src/mpt/.clang-format with clang-format 14 takes about 14 minutes for me (CPU is a 3.4GHz AMD Piledriver). I am using clang-format version 14.0.5 (as shipped with Visual Studio 2022 17.3.0) or Ubuntu clang-format version 14.0.6-++20220622053131+f28c006a5895-1~exp1~20220622173215.157 (from deb http://apt.llvm.org/focal/ llvm-toolchain-focal-14 main).

I do not see any excessive memory consumption, it's only CPU time.

The performance was fine with clang-format 13.

I could also provide other source files where performance is noticeably slow with clang-format 14 if required, however the given file is the worst offender.

manxorist avatar Aug 12 '22 10:08 manxorist

I can "confirm" this problem. I just upgrade Qt Creator from 6.0.0 to 8.0.1, and suddenly it ate up all my CPUs - 99,7 % load when idle... Disabling the ClangFormat plugin dropped the CPU load to less than 1 % when idle.

Maybe related to https://bugs.llvm.org/show_bug.cgi?id=49519

hansfn avatar Aug 12 '22 12:08 hansfn

@llvm/issue-subscribers-clang-format

llvmbot avatar Aug 12 '22 13:08 llvmbot

@manxorist Could you please run this with QualifierAlignment set to Leave instead of Custom and remove QualifierOrder in your .clang-format?

codido avatar Aug 12 '22 18:08 codido

manx@quadratus:~/projects/openmpt/trunk-1$ time clang-format-14 -i src/mpt/arch/x86_amd64.hpp
real    0m2.576s
user    0m2.555s
sys     0m0.020s

@codido So yes, changing QualifierAlignment to Leave makes it run in seconds and avoids the performance problem.

I'll probably use that as a work-around for now.

manxorist avatar Aug 12 '22 18:08 manxorist

Qualifier Ordering along with a file with lots of #ifdef will likely be the combined cause

mydeveloperday avatar Aug 14 '22 14:08 mydeveloperday

@brad0 since you added it to the 15.x milestone and it seems to be pretty high impact issue - I will move this from triage to needs fix. Next rc is released the 23rd of august if you want to get a fix in that one.

tru avatar Aug 15 '22 07:08 tru

@mydeveloperday is anyone looking for a fix for this issue for 15.x?

tru avatar Aug 22 '22 06:08 tru

@tru Sorry I did not see your response. I don't have a fix at the moment. I just added the ticket as I saw this referenced in another community and it seemed like a regression that would be good to fix for 15.x.

brad0 avatar Aug 26 '22 02:08 brad0

15.0.1 window is soon closing - any updates on a fix here?

tru avatar Sep 15 '22 07:09 tru

Moving this to 15.0.2

tru avatar Sep 19 '22 06:09 tru

Removing this from 15.x - let me know if there is a fix we can merge.

tru avatar Oct 07 '22 06:10 tru

I also noticed this slow down. I do not think that it is necessarily or exclusively related to preprocessor macros such as #ifdef. I have a cpp file with ~13000 lines (650KB) without any preprocess macros (asides from includes). Formatting some small selection (~5 lines) from within Visual Studio 2022 17.4.0 with clang-format 14 or 15 without QualifierAlignment runs in approx. half a second (as it used to), while with QualifierAlignment: Custom and QualifierOrder: ['inline', 'static', 'constexpr', 'type', 'const', 'volatile' ] in the configuration file takes approx. 2 seconds. So 4x longer. A QualiferAlignment of Left or Right is similarly slower.

Sedeniono avatar Nov 16 '22 09:11 Sedeniono

I did a bit of profiling myself. From what I could understand:

  • I noticed that even without any preprocessor directives in the source code, using anything but QualifierAlignment: Leave slows down clang-format noticeably. For example, take some file with like 20000 lines. With Leave the formatting is almost instantaneous, while any other QualifierAlignment setting takes a considerable amount of time (seconds). The more elements are in QualifierOrder, the longer it takes. Apparently, the QualifierAlignmentFixer goes over the whole file once for every single element in QualifierOrder (see the loop over Passes). So the setting

      QualifierAlignment: Custom
      QualifierOrder: ['inline', 'static', 'constexpr', 'restrict', 'type', 'const', 'volatile' ]
    

    (I think 6 passes) is MUCH slower than just

      QualifierAlignment: Custom
      QualifierOrder: ['type', 'const' ]
    

    (1 pass). So if you only really care about const, use that setting. It is still notably slower than QualifierAlignment: Leave, but maybe in some cases fast enough. For QualifierAlignment: Left and QualifierAlignment: Right the QualifierOrder contains 2 elements (const and volatile).

  • The problem multiplies in presence of nested preprocessor directives. As far as I understand, clang-format does some macro evaluation. In the example of the OP, clang-format seems to generate 289 "variations" of the input file. (The UnwrappedLines contains 289 elements, and each element seems to represent a version of the input file.) In other words, clang-format creates 289 copies of the file, and each time the QualifierAlignmentFixer then goes another N times over the whole file, where N is the number of elements in QualifierOrder. So this scales very poorly, leading to the extreme run times observed by the OP.

Sedeniono avatar May 11 '23 19:05 Sedeniono

I did some more digging and I think I understand the problem better and came up with a potential patch. My previous analysis was not complete.

In clang::format::internal::reformat() the code iterates over all individual passes (where a pass is a "formatting job"), one of which is QualifierAlignmentFixer(...).process(). This process() call is actually calling the generic implementation TokenAnalyzer::process(). TokenAnalyzer::process() iterates over UnwrappedLines, which in the example of the OP contains 289 entries. Each entry represents 1 variant of the input file, generated by evaluating preprocessor directives differently. For each entry, it calls QualifierAlignmentFixer::analyze() which does a loop over a fixer for every qualifier (in the code also called "Passes"). It calls LeftRightQualifierAlignmentFixer(...).process() each time, which again is the generic implementation in TokenAnalyzer::process(). There it again iterates over all 289 entries in UnwrappedLines, and calls LeftRightQualifierAlignmentFixer::analyze() for each.

So, boiled down, the structure looks like this:

for each of the 289 versions of the file in UnwrappedLines:
    for each configured qualifier:
        for each of the 289 versions of the file in UnwrappedLines:
            LeftRightQualifierAlignmentFixer::analyze()

This obviously scales quadratic with the number of preprocessor branches, causing the terrible performance. Generally speaking, it seems to be a bad idea to use a TokenAnalyzer within another TokenAnalyzer since this will always cause this double loop over UnwrappedLines.

I currently do not really see a good reason for the inner loop over the UnwrappedLines. So I attempted to fix the performance issue and created a potential patch Patch_57117.patch. It does the following:

  • It gets rid of the the double loop over UnwrappedLines by adding the passes for the individual qualifiers (i.e. the calls to LeftRightQualifierAlignmentFixer(...).process()) to the list of passes in clang::format::internal::reformat() directly, instead of using an intermediate QualifierAlignmentFixer. Hence, QualifierAlignmentFixer became obsolete and is removed.
  • This change bypasses also the code in QualifierAlignmentFixer::analyze(). However, the code seems to have been copy & paste from clang::format::internal::reformat() anyway, except the part at the end where it filters out replacements that do nothing ("NoOpFixes"). I originally tried to move that code to the individual qualifier analysis (LeftRightQualifierAlignmentFixer::analyze()), but this caused many test failures because the NoOpFixes-code relies on the fixes for all qualifiers to have been merged. So instead, in the patch I moved the code to the top level function clang::format::internal::reformat(). Of course, this then potentially affects the output of clang-format always, i.e. for all passes, not just for the qualifier fixer. I am really unsure of the implications, but maybe it is a good idea to filter replacements that do nothing always? In any case, all tests pass.
  • Regarding the "NoOpFixes" code, the original commit that introduced it also added a test (QualifierFixerTest.NoOpQualifierReplacements). Unfortunately, the test got broken at some point since it no longer tests the ReplacementCount of the first verifyFormat. So my patch fixes this in order to test the effect of the "NoOpFixes" code again.

Formatting the x86_amd64.hpp with the given .clang-format file of the OP:

  • Without the patch: 320s
  • With the patch: 3s
  • For comparison, with QualifierAlignment: Leave: 1.1s

So an increase of performance by two orders of magnitude 😃 The resulting files are identical.

Any opinions on the patch from someone with any clang-format experience, maybe @mydeveloperday? Especially regarding the NoOpFixes code that I moved to clang::format::internal::reformat()? Or should I just submit it for review?

Sedeniono avatar May 13 '23 17:05 Sedeniono

I just submitted a fix for review: https://reviews.llvm.org/D153228

Sedeniono avatar Jun 18 '23 15:06 Sedeniono

Or should I just submit it for review?

The answer is always: yes. :) I for one don't look here at github (sorry, no time), but reviews I look at all (clang-format related), most of the time I also post something.

HazardyKnusperkeks avatar Jun 18 '23 19:06 HazardyKnusperkeks