qmcpack Preliminary assessment of static-analysis warnings (clang-tidy)

Describe the bug

After a brief conversation on Slack I ran a blanket static analysis check on the whole source with clang-tidy.

I found about ~~2'294'900 (yes, 2 x 10^6)~~ 242'406 unique warnings. (attached file)

This is an absolute upper bound, it is possible that only one order of magnitude less warnings actually deserve any attention.

From my experience from other code: I estimate that about half can be automatically fix by clang-tidy (yes, the tool offers automatic solutions by doing text replacements in the source code. In my experience the tools offers fixes only when it is safe.)

Of the remaining, maybe half could be considered false positives or style related.

Maybe a bunch will solve unknown bugs or undefined behavior.

In my opinion, almost all warnings (~99%) have a positive impact if fixed but there is a decision to make to which I what point to stop. (not counting the checks that should be disable IMO, listed below).

To Reproduce

clone repo

cd build
export OMPI_CXX=clang++
export OMPI_CC=clang
cmake -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DBUILD_AFQMC=1 -DQMC_MIXED_PRECISION=1 -DCMAKE_BUILD_TYPE=Debug -DMPIEXEC_PREFLAGS="--allow-run-as-root;--bind-to;none" -DCMAKE_CXX_CLANG_TIDY="clang-tidy;-checks=*"  ..
make 2>&1 | grep warning | tee clang-tidy-warnings.txt

Sit tight
Inspect sort clang-tidy-warnings.txt | uniq (file attached) clang-tidy-warns.zip

Expected behavior

No warnings

System:

domestic Ubuntu 21.10 compiling with clang.

Additional context

There is no reason to think that any of the warnings are critical. To begin I think that this checks are for the most part useless and misleading.

(.clang-tidy)

Checks: '*,
  -readability-magic-numbers,
  -modernize-use-nodiscard,
  -altera-id-dependent-backward-branch,
  -altera-struct-pack-align,
  -altera-unroll-loops,
  -cert-err58-cpp,
  -cppcoreguidelines-avoid-non-const-global-variables,
  -cppcoreguidelines-macro-usage,
  -cppcoreguidelines-pro-bounds-array-to-pointer-decay,
  -cppcoreguidelines-pro-type-vararg,
  -cppcoreguidelines-avoid-magic-numbers,
  -fuchsia-default-arguments-calls,
  -fuchsia-trailing-return,
  -fuchsia-statically-constructed-objects,
  -fuchsia-overloaded-operator,
  -google-runtime-references,
  -hicpp-vararg,
  -hicpp-no-array-decay,
  -llvm-header-guard,
  -llvmlibc-restrict-system-libc-headers,
  -llvmlibc-implementation-in-namespace,
  -llvmlibc-callee-namespace
'
WarningsAsErrors: '*'
HeaderFilterRegex: '.'
AnalyzeTemporaryDtors: false
FormatStyle: file

Apr 10 '22 01:04 correaa

Q1. If you rerun with AFQMC disabled, what is the new warning count? It would be helpful to know the split between auxiliary and real space warnings.

Q2. Can you see how many magic numbers are used? These should all be turned into named constants and systematized over time. It would be helpful to look out for any pi, e etc. that are not at full precision. We fixed a few of these some month back, but more may exist.

Apr 10 '22 02:04 prckent

Q1. If you rerun with AFQMC disabled, what is the new warning count? It would be helpful to know the split between auxiliary and real space warnings.

A1. This is very strange I though that by doing grep AFQMC on the output file I would see the warnings from AFQMC, at least because of the path. But almost nothing appeared. There is something fishy about that, either clang-tidy didn't finish or for some reason the AFQMC part was not compiled.

To confirm that, I am running again with:

cmake -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DBUILD_AFQMC=0 -DQMC_MIXED_PRECISION=1 -DCMAKE_BUILD_TYPE=Debug -DMPIEXEC_PREFLAGS="--allow-run-as-root;--bind-to;none" -DCMAKE_CXX_CLANG_TIDY="clang-tidy;-checks=*"  ..

make 2>&1 | grep warning | tee clang-tidy-warnings_noAFQMC.txt

Q2. Can you see how many magic numbers are used? These should all be turned into named constants and systematized over time. It would be helpful to look out for any pi, e etc. that are not at full precision. We fixed a few of these some month back, but more may exist.

A2. Magic numbers as in "cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers", there about 100`000 of those!, not sure if there is an automatic fix for that (it would defeat the purpose of the warning if these is an automatic way to name the magic numbers).

$ grep magic clang-tidy-warnings.txt | wc --lines
22'150

At 30 seconds per case, an average grad student will take 120 hours to fix. Assuming he/she will come up with good names for the constants very quickly.

It is really depressing to see the magnitude of warnings. With Multi which is a code I had control from the start, it took me about 2 weeks to solve a couple of thousand warnings, of which about 10 or so forced me to refactor code seriously, changes that took at least a few hours each.

In perspective I am glad I did, because the code improved a lot, however this is at another scale. Also, (generic) library code is different than "final" code, a library should be correct in many enviroments while final code should simply produce correct executables in the supported platforms.

Here it is a sample of the first few warnings of magic numbers,

$ grep magic clang-tidy-warnings.txt | head       
/home/correaa/prj/qmcpack/src/Platforms/Host/sysutil.cpp:54:10: warning: 32 is a magic number; consider replacing it with a named constant [cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers]
/home/correaa/prj/qmcpack/src/Platforms/Host/sysutil.cpp:55:15: warning: 32 is a magic number; consider replacing it with a named constant [cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers]
/home/correaa/prj/qmcpack/src/Platforms/MemoryUsage.cpp:29:38: warning: 30 is a magic number; consider replacing it with a named constant [cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers]
/home/correaa/prj/qmcpack/src/Platforms/MemoryUsage.cpp:35:71: warning: 7 is a magic number; consider replacing it with a named constant [cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers]
/home/correaa/prj/qmcpack/src/Platforms/MemoryUsage.cpp:35:91: warning: 20 is a magic number; consider replacing it with a named constant [cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers]
/home/correaa/prj/qmcpack/src/Platforms/MemoryUsage.cpp:36:71: warning: 7 is a magic number; consider replacing it with a named constant [cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers]
/home/correaa/prj/qmcpack/src/Platforms/MemoryUsage.cpp:36:92: warning: 10 is a magic number; consider replacing it with a named constant [cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers]
/home/correaa/prj/qmcpack/src/Message/Communicate.cpp:95:82: warning: 20 is a magic number; consider replacing it with a named constant [cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers]
/home/correaa/prj/qmcpack/src/Containers/OhmmsPETE/AntiSymTensor.h:102:64: warning: 0.5 is a magic number; consider replacing it with a named constant [cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers]
/home/correaa/prj/qmcpack/src/Containers/OhmmsPETE/SymTensor.h:96:7: warning: 5 is a magic number; consider replacing it with a named constant [cppcoreguidelines-avoid-magic-numbers,readability-magic-numbers]

Apr 10 '22 03:04 correaa

 $ grep warning make_yesAFQMC.txt | sort | uniq | wc --lines
242406

 $ grep warning make_noAFQMC.txt | sort | uniq | wc --lines
195239

Apr 11 '22 01:04 correaa

Not clear how to filter this to what would actually improve the code. Tons and tons of header include warnings for instance, but I can't figure what is wrong about them _HPP vs _H? Thousands and thousands of complains about OHMMS related files.

Tried to drill into the magic number thing mentioned above. Found a lot of issues in src/Numerics/tests/test_cartesian_tensor. Here's an example of what it thinks about a single line of that file:

REQUIRE(ct.getHessYlm(32)(1, 2) == Approx(8.46129914327));

home/correaa/prj/qmcpack/src/Numerics/tests/test_cartesian_tensor.cpp:248:28: warning: 'operator()' must resolve to a function declared within the '__llvm_libc' namespace [llvmlibc-callee-namespace] /home/correaa/prj/qmcpack/src/Numerics/tests/test_cartesian_tensor.cpp:248:35: warning: 'operator==Catch::Detail::Approx' must resolve to a function declared within the '__llvm_libc' namespace [llvmlibc-callee-namespace] /home/correaa/prj/qmcpack/src/Numerics/tests/test_cartesian_tensor.cpp:248:35: warning: 'operator==<double, void>' must resolve to a function declared within the '__llvm_libc' namespace [llvmlibc-callee-namespace] /home/correaa/prj/qmcpack/src/Numerics/tests/test_cartesian_tensor.cpp:248:3: warning: kernel performance could be improved by unrolling this loop with a '#pragma unroll' directive [altera-unroll-loops] /home/correaa/prj/qmcpack/src/Numerics/tests/test_cartesian_tensor.cpp:248:3: warning: 'operator""_catch_sr' must resolve to a function declared within the '__llvm_libc' namespace [llvmlibc-callee-namespace] /home/correaa/prj/qmcpack/src/Numerics/tests/test_cartesian_tensor.cpp:248:3: warning: 'operator<=' must resolve to a function declared within the '__llvm_libc' namespace [llvmlibc-callee-namespace] /home/correaa/prj/qmcpack/src/Numerics/tests/test_cartesian_tensor.cpp:249:28: warning: 'operator()' must resolve to a function declared within the '__llvm_libc' namespace [llvmlibc-callee-namespace] /home/correaa/prj/qmcpack/src/Numerics/tests/test_cartesian_tensor.cpp:249:35: warning: 'operator==Catch::Detail::Approx' must resolve to a function declared within the '__llvm_libc' namespace [llvmlibc-callee-namespace] /home/correaa/prj/qmcpack/src/Numerics/tests/test_cartesian_tensor.cpp:249:35: warning: 'operator==<double, void>' must resolve to a function declared within the '__llvm_libc' namespace [llvmlibc-callee-namespace] /home/correaa/prj/qmcpack/src/Numerics/tests/test_cartesian_tensor.cpp:249:3: warning: kernel performance could be improved by unrolling this loop with a '#pragma unroll' directive [altera-unroll-loops] /home/correaa/prj/qmcpack/src/Numerics/tests/test_cartesian_tensor.cpp:249:3: warning: 'operator""_catch_sr' must resolve to a function declared within the '__llvm_libc' namespace [llvmlibc-callee-namespace] /home/correaa/prj/qmcpack/src/Numerics/tests/test_cartesian_tensor.cpp:249:3: warning: 'operator<=' must resolve to a function declared within the '__llvm_libc' namespace [llvmlibc-callee-namespace] /home/correaa/prj/qmcpack/src/Numerics/tests/test_cartesian_tensor.cpp:249:45: warning: 8.46129914327 is a magic number; consider replacing it with a named constant

Are the llvmlibc-callee-namespace warnings something worth chasing? Should we be unrolling loops based on its advice? Finally, is it a benefit here to rename all of these constants in this list of checks?

Not opposed to utilizing this sort of tool to improve the code, just lost as to what is actually important / actionable. Help.

Apr 12 '22 20:04 lshulen

Absolutely, I agree. In the first message I made a list of warnings that I would disable right away because 1) I cannot do anything about (e.g. depends on the testing library), 2) do not make any sense.

I got to the conclusion that llvmlibc-* warnings do not make any sense, I think those are for llvm library developers. Some fuchia-* warnings are mostly contradictory with other warnings.

In general, each warning is well documented and explained here: https://clang.llvm.org/extra/clang-tidy/checks/list.html

Some warnings are also about industry conventions (e.g. header guards naming). These and other warnings are low priority but also have low cost because they can be automatically fixed by text replacement.

Apr 12 '22 21:04 correaa

Thanks, I'll dig more into the warning types.

Apr 12 '22 21:04 lshulen

As discussed above, the real question here is which warnings indicate (i) dangerous code and potentially hidden bugs (ii) bad practices that we should avoid (subjective) (iii) ignorable or irrelevant warnings. There is hopefully some low hanging fruit that will get rid of many, but category (i) should have priority.

e.g. Regarding magic numbers, if we have some code which,say, checks if the number of points is 5, then this is clear and there is nothing to be gained with a named constant. Same goes for arbitrary check values in tests as Luke mentioned. However, floating point tolerances (if (a<0.001)) would benefit from being handled more consistently throughout and given good names. And previously we found a hard-coded pi which did not have full precision.

Apr 13 '22 15:04 prckent

Apr 14 '22 19:04 correaa

Few more datapoints:

 $ grep bugprone make_yesAFQMC.txt | sort | uniq | wc -l
4349
 $ grep cppcoreguidelines make_yesAFQMC.txt | sort | uniq | wc -l
47031

Apr 14 '22 20:04 correaa

Could we get a count of how many of each individual check in each category? grep bugprone make_yesAFQMC.txt | awk --field-separator '[' '{print $2}' | sort | uniq -c

Running this on just one file (QMCCostFunctionBatched.cpp, and these results also have the non-system includes from this file) gives

      2 bugprone-branch-clone]
    106 bugprone-easily-swappable-parameters]
      1 bugprone-exception-escape]
      3 bugprone-implicit-widening-of-multiplication-result]
      2 bugprone-incorrect-roundings]
     51 bugprone-macro-parentheses]
     79 bugprone-narrowing-conversions]
      5 bugprone-reserved-identifier]
      2 bugprone-suspicious-include]
      5 bugprone-unhandled-self-assignment]

Apr 14 '22 20:04 markdewing

Thank @markdewing, I was to lazy to write the "histogram" command. (note: I still can't upload the make_yesAFQMC.zip file in github).

$  grep bugprone make_yesAFQMC.txt | awk --field-separator '[' '{print $2}' | sort | uniq -c | sort -nr
  37621 bugprone-narrowing-conversions,cppcoreguidelines-narrowing-conversions]
  30296 bugprone-easily-swappable-parameters]
  28112 bugprone-macro-parentheses]
   4410 bugprone-implicit-widening-of-multiplication-result]
   3364 bugprone-reserved-identifier,cert-dcl37-c,cert-dcl51-cpp]
   2052 bugprone-unhandled-self-assignment,cert-oop54-cpp]
   1058 bugprone-suspicious-include]
    631 bugprone-exception-escape]
    513 bugprone-branch-clone]
     64 bugprone-misplaced-widening-cast]
     40 bugprone-move-forwarding-reference]
     12 bugprone-integer-division]
      6 bugprone-use-after-move,hicpp-invalid-access-moved]
      6 bugprone-fold-init-type]
      1 bugprone-unused-raii]
      1 bugprone-copy-constructor-init]

Apr 14 '22 22:04 correaa

Now that we narrowed down the warnings. You can look at the explanations here https://clang.llvm.org/extra/clang-tidy/checks/list.html

However this my 2-cents experience with the listed ones:

  37621 bugprone-narrowing-conversions,cppcoreguidelines-narrowing-conversions]

This is triggered by conversions between signed and non-signed integers or integers of different sizes in a ways that they can overflow, or convert negative numbers into large numbers. My perceived severity: This can potentially lead (in the future) to bugs. My typical solutions: use static_cast to make conversion more explicit. Opportunity to use safe_cast-type functions.

It is also triggered by double to float conversions. My perceived severity: These are rarely bugs IMO, at worst can create a gratuitous lost of precision and an unnecessary waste of cycles converting between floating point precision. My typical solutions: use static_cast or be more consistent and work in double precision only.

  30296 bugprone-easily-swappable-parameters]

Triggered by things like void f(int, int, int, bool, bool, bool); My typical solutions: make sure that parameter names are easily distinguished in the declaration, look for points where the function is used and make sure that variable names are also descriptive. Gold plating solution: use strong typing or exploit existing already abstracted types.

  28112 bugprone-macro-parentheses]

For example #define plusone(x) x + 1. Should be #define plusone(x) (x) + 1

This is a must, parameters in macros should have parenthesis otherwise associativity rules in expressions can change its meaning. This probably has an automatic fix and could break already buggy code (which is good).

   4410 bugprone-implicit-widening-of-multiplication-result]

This is almost never a bug I think but it can produce unnecessary conversions in the middle of otherwise efficient multiplications preventing optimizations. Solution: static_cast to desired result type (in general long) individual terms in a multiplication.

   3364 bugprone-reserved-identifier,cert-dcl37-c,cert-dcl51-cpp]

Someone is using variable names starting with __, these are reserved for standard library and it is technically UB in all cases.

   2052 bugprone-unhandled-self-assignment,cert-oop54-cpp]

Dynamic objects should contemplate the case of self-assigment. Sometimes these are false positives, however this bug is so pervasive that it is worth explaining how self assignment is handled. My typical solution: Assignment is one of the most tricky functions to implement! if you don't understand how is it done at the least have this as the first line of its implementation(if( this = &other) {return *this;}).

   1058 bugprone-suspicious-include]

I see this when people try to include files that are not header files, e.g. cpp files. If you need to do that, at least add a comment like this: // NOLINT(bugprone-suspicious-include) explanation why this unconventional file is textually included here

    631 bugprone-exception-escape]

I never saw this. Good excuse to rethink exception handling.

    513 bugprone-branch-clone]

???

     64 bugprone-misplaced-widening-cast]

???

     40 bugprone-move-forwarding-reference]

Know the difference between std::forward<T> and std::move. Rethink if move operation is correct.

     12 bugprone-integer-division]

This should be a warning by the compiler already. There is an unchecked path that can lead to division by zero. Division by zero terminates the code. The logic of the program should be able to prove that the zero division doesn't occur. At worst add an assert before the division expression; at best rethink the logic of the function to prevent the instruction to be executed in the zero-case. Help the compiler see-through the logic so it can prove that it never happens. Yes, compilers can do that.

      6 bugprone-use-after-move,hicpp-invalid-access-moved]

Don't use a variable after std::move(var) and before reassigment var = ...;. Some classes can, by design, be used after move but in this case a comment is necessary.

      6 bugprone-fold-init-type]

???

      1 bugprone-unused-raii]

??? Looks interesting.

      1 bugprone-copy-constructor-init]

???

Apr 14 '22 22:04 correaa

bugprone-narrowing-conversions It is also triggered by double to float conversions.

I have a suspicion is that we have a few of these lurking that actually are bugs. This is based on observing the variation in some of our numerical results and their apparent final precision. We might also have some explicit conversions lurking that are wrong.

May 04 '22 19:05 prckent

On Wed, May 4, 2022 at 12:06 PM Paul R. C. Kent @.***> wrote:

| bugprone-narrowing-conversions | It is also triggered by double to float conversions.

I have a suspicion is that we have a few of these lurking that actually are bugs. This is based on observing the variation in some of our numerical results and their apparent final precision. We might also have some explicit conversions lurking that are wrong.

I think the safest and cheapest possible solution is to eliminate all floats from the code. Starting from the assignment conversion that triggers the warning. Then, in code templates static_assert( not std::is_same_v<T, float> ).

Thank you, Alfredo

Message ID: @.***>

May 04 '22 19:05 correaa