New check: use memcpy/memset instead of loop
I'd like to propose a new performance check. I noticed in 2 projects some bad pattern: instead of calling memcpy authors prefer hand-made for loop with copying byte-by-byte.
Examples: https://github.com/openvinotoolkit/openvino/pull/10996/files https://github.com/openscad/openscad/blob/master/src/ext/lodepng/lodepng.cpp#L120
I tested 3 modern compilers: gcc, msvc, clang (all with O2 key). Only clang generates something looks like a vectorized version (in other words it recognized the pattern). gcc and msvc used movzx opcode.
I still not sure where is the best place for this check, due it works not only for c++.
Yes as you say this does not belong in CheckStl. Not sure where it's better. I am thinking about CheckFunction or CheckOther. CheckOther is very large already so therefore I prefer that you use CheckFunction.
I am thinking that it might be possible with some false positives:
for (i = 0; i < count; i += 2) { .... } // FP increment with 2 for (....) { a[i1] = b[i2]; } // FP different indexes (use memset instead?) for (i = 0; b[i] != 0; i++) { ... } // FP unknown count
Thank you for feedback. I guess I'll try to figure out how to use extractForLoopValues function here. 2nd case should be with cast expressions, just clarification.
Can you please run the tools/test-my-pr.py script? Do you use Windows? I am not sure if that script works well in windows.
I have both, and Linux is more preferable for development purposes 🐧
And one more question, hand-made memset, shall we check it too?
And one more question, hand-made memset, shall we check it too?
I am not against it.
For information.. I do feel a bit skeptic about the performance category. Cppcheck can't know what difference each suggestion will make. I believe that in general, readability is more important than micro optimisations.
However for this particular check that you have implemented.. in my humble opinion it should improve readability to use memcpy instead of a raw loop.
For memset it can be questionable if it improves readability in some cases.
For information.. I do feel a bit skeptic about the performance category. Cppcheck can't know what difference each suggestion will make. I believe that in general, readability is more important than micro optimisations.
However for this particular check that you have implemented.. in my humble opinion it should improve readability to use memcpy instead of a raw loop.
For memset it can be questionable if it improves readability in some cases.
You suggesting to treat it as useStlAlgorithm? I guess it's ok. I still could left a comment about performance in the message.
My motivation about memset is a similar to memcpy: vectorized code should be much faster. But in this way it should be implemented for all standard functions, including detecting bubble sort (laugh)
Maybe the other question is more important: is this check OK in CppCheck philosophy. I don't want to remove check later.
I tested 3 modern compilers: gcc, msvc, clang (all with
O2key). Only clang generates something looks like a vectorized version (in other words it recognized the pattern). gcc and msvc usedmovzxopcode.
If the compilers produce sub-optimal code it would make sense to report this upstream.
For LLVM this should be easy since they are also on GitHub - GCC and MSVC need additional accounts.
I tested 3 modern compilers: gcc, msvc, clang (all with
O2key). Only clang generates something looks like a vectorized version (in other words it recognized the pattern). gcc and msvc usedmovzxopcode.If the compilers produce sub-optimal code it would make sense to report this upstream.
For LLVM this should be easy since they are also on GitHub - GCC and MSVC need additional accounts.
I don't want to call it 'suboptimal', this code is doing exactly what the user asking for: byte by byte copying. I don't want to blame the tools in this case.
I don't want to call it 'suboptimal', this code is doing exactly what the user asking for: byte by byte copying. I don't want to blame the tools in this case.
It's not blaming. The compilers try to detect such pattern which will internally replaced with "lib" calls which can do a better code optimization which might be enable to omit more simplified call and avoid the library call completely. Same is done for pattern which can be modeled as instructions instead.
As an example look at the recent work for for ``memchr(): https://github.com/llvm/llvm-project/commit/516915beb5ee5012a9a8b162fc29664a8c247ec3 https://github.com/llvm/llvm-project/commit/e263a7670e28d880ec45971f91fa88de01fc51e1 https://github.com/llvm/llvm-project/commit/5ccfd5f6d43069a5bc152b335bc704c6d2584ddc
I also remember a detection for a common pattern which occurs in zlib which lead to a 10% performance increase when compiled with clang: https://github.com/llvm/llvm-project/commit/d870aea03ecb2387c215afb16f5e2073718d5aa4
In case of memcpy there might not be as much (or any) room but might be still worth pointing it out.
I'm not aware of clang internals, if you believe that it should be reported, could you please report it. Not because I'm lazy, just because you know how it report properly, like how it should work. And also it's effect of running inside godbolt, not in regular environment.
But in general I don't want to rely on compiler optimizations, it's a free bonus but also it's a vendor lock and hard to control the result.
FYI https://github.com/llvm/llvm-project/issues/56467
FYI https://github.com/llvm/llvm-project/issues/56662 - so even with the successful detection of memset() there might still be issues.
FYI https://github.com/llvm/llvm-project/issues/47852
FYI https://github.com/llvm/llvm-project/commit/e079bf655832b2261faf6883984400a8c37b1baa
Will be the last one - sorry for spamming. Just highlighting what a compiler is doing internally for such calls so it's not always necessary to write the best possible code just to get the best possible performance.
So please consider filing these shortcomings upstream. As you need to follow up on these things it doesn't make sense that way for me to file these either since I did not encounter these issues. Thanks.
Hmm, I little bit confused. I have a line for (size_t i = 0; i < 64*(sizeof(int) + 42); i++) { in testcase. And a token (, from here: 64*(sizeof has a tokType() == 17, which is a eExtendedOp enum entry. Not too much cases of usage this tokType in the project. I guess I need some clarification.
In token.cpp:145-146 there are lines:
else if (mStr.size() == 1 && mStr.find_first_of(",[]()?:") != std::string::npos)
tokType(eExtendedOp);
Brackets has such set expression in token.cpp:170-171:
else if (mStr.size() == 1 && (mStr.find_first_of("{}") != std::string::npos || (mLink && mStr.find_first_of("<>") != std::string::npos)))
tokType(eBracket);
Is it an expected behaviour? Did I miss something?
(Just trying to improve analysis of expressions inside condition of for loop)
I am not 100% sure but probably it's expected behavior. does it cause a problem?
I am not 100% sure but probably it's expected behavior. does it cause a problem?
In my case it's not a problem. But seems I can't figure out what is the eExtendedOp. I guess It's understandable for ? and :. And , in case of comma operator. It's something like https://en.cppreference.com/w/cpp/language/operator_other ? But what about []() symbols, why it's not a eBracket?
But what about symbols, why it's not a eBracket?
ah yes. It would make sense that square brackets and round brackets are eBracket. Can we try to change this sometime after the release.
But what about symbols, why it's not a eBracket?
ah yes. It would make sense that square brackets and round brackets are eBracket. Can we try to change this sometime after the release.
I guess I could check it next
It seems problem was with a isConstFunctionCall function. Now isConstExpression works fine and I got rid of my own const checker.
But what about symbols, why it's not a eBracket?
ah yes. It would make sense that square brackets and round brackets are eBracket. Can we try to change this sometime after the release.
I guess I could check it next
Also I tried to change and got strange fails in tests: TestUnusedVar::isRecordTypeWithoutSideEffects|cleanFunction|localvar15,56
Also I tried to change and got strange fails in tests: TestUnusedVar::isRecordTypeWithoutSideEffects|cleanFunction|localvar15,56
ok thanks. Well we can try to remember to look into that someday.