cppcheck icon indicating copy to clipboard operation
cppcheck copied to clipboard

New check: use memcpy/memset instead of loop

Open ntfshard opened this issue 3 years ago • 12 comments

I'd like to propose a new performance check. I noticed in 2 projects some bad pattern: instead of calling memcpy authors prefer hand-made for loop with copying byte-by-byte.

Examples: https://github.com/openvinotoolkit/openvino/pull/10996/files https://github.com/openscad/openscad/blob/master/src/ext/lodepng/lodepng.cpp#L120

I tested 3 modern compilers: gcc, msvc, clang (all with O2 key). Only clang generates something looks like a vectorized version (in other words it recognized the pattern). gcc and msvc used movzx opcode.

I still not sure where is the best place for this check, due it works not only for c++.

ntfshard avatar Jul 09 '22 23:07 ntfshard

Yes as you say this does not belong in CheckStl. Not sure where it's better. I am thinking about CheckFunction or CheckOther. CheckOther is very large already so therefore I prefer that you use CheckFunction.

I am thinking that it might be possible with some false positives:

for (i = 0; i < count; i += 2) {   ....  }  // FP increment with 2
for (....) {  a[i1] = b[i2];  }   // FP different indexes (use memset instead?)
for (i = 0; b[i] != 0; i++) {   ...  }   // FP unknown count

Thank you for feedback. I guess I'll try to figure out how to use extractForLoopValues function here. 2nd case should be with cast expressions, just clarification.

Can you please run the tools/test-my-pr.py script? Do you use Windows? I am not sure if that script works well in windows.

I have both, and Linux is more preferable for development purposes 🐧

And one more question, hand-made memset, shall we check it too?

ntfshard avatar Jul 10 '22 14:07 ntfshard

And one more question, hand-made memset, shall we check it too?

I am not against it.

danmar avatar Jul 10 '22 15:07 danmar

For information.. I do feel a bit skeptic about the performance category. Cppcheck can't know what difference each suggestion will make. I believe that in general, readability is more important than micro optimisations.

However for this particular check that you have implemented.. in my humble opinion it should improve readability to use memcpy instead of a raw loop.

For memset it can be questionable if it improves readability in some cases.

danmar avatar Jul 10 '22 15:07 danmar

For information.. I do feel a bit skeptic about the performance category. Cppcheck can't know what difference each suggestion will make. I believe that in general, readability is more important than micro optimisations.

However for this particular check that you have implemented.. in my humble opinion it should improve readability to use memcpy instead of a raw loop.

For memset it can be questionable if it improves readability in some cases.

You suggesting to treat it as useStlAlgorithm? I guess it's ok. I still could left a comment about performance in the message.

My motivation about memset is a similar to memcpy: vectorized code should be much faster. But in this way it should be implemented for all standard functions, including detecting bubble sort (laugh)

Maybe the other question is more important: is this check OK in CppCheck philosophy. I don't want to remove check later.

ntfshard avatar Jul 10 '22 18:07 ntfshard

I tested 3 modern compilers: gcc, msvc, clang (all with O2 key). Only clang generates something looks like a vectorized version (in other words it recognized the pattern). gcc and msvc used movzx opcode.

If the compilers produce sub-optimal code it would make sense to report this upstream.

For LLVM this should be easy since they are also on GitHub - GCC and MSVC need additional accounts.

firewave avatar Jul 10 '22 19:07 firewave

I tested 3 modern compilers: gcc, msvc, clang (all with O2 key). Only clang generates something looks like a vectorized version (in other words it recognized the pattern). gcc and msvc used movzx opcode.

If the compilers produce sub-optimal code it would make sense to report this upstream.

For LLVM this should be easy since they are also on GitHub - GCC and MSVC need additional accounts.

I don't want to call it 'suboptimal', this code is doing exactly what the user asking for: byte by byte copying. I don't want to blame the tools in this case.

ntfshard avatar Jul 10 '22 19:07 ntfshard

I don't want to call it 'suboptimal', this code is doing exactly what the user asking for: byte by byte copying. I don't want to blame the tools in this case.

It's not blaming. The compilers try to detect such pattern which will internally replaced with "lib" calls which can do a better code optimization which might be enable to omit more simplified call and avoid the library call completely. Same is done for pattern which can be modeled as instructions instead.

As an example look at the recent work for for ``memchr(): https://github.com/llvm/llvm-project/commit/516915beb5ee5012a9a8b162fc29664a8c247ec3 https://github.com/llvm/llvm-project/commit/e263a7670e28d880ec45971f91fa88de01fc51e1 https://github.com/llvm/llvm-project/commit/5ccfd5f6d43069a5bc152b335bc704c6d2584ddc

I also remember a detection for a common pattern which occurs in zlib which lead to a 10% performance increase when compiled with clang: https://github.com/llvm/llvm-project/commit/d870aea03ecb2387c215afb16f5e2073718d5aa4

In case of memcpy there might not be as much (or any) room but might be still worth pointing it out.

firewave avatar Jul 10 '22 19:07 firewave

I'm not aware of clang internals, if you believe that it should be reported, could you please report it. Not because I'm lazy, just because you know how it report properly, like how it should work. And also it's effect of running inside godbolt, not in regular environment.

But in general I don't want to rely on compiler optimizations, it's a free bonus but also it's a vendor lock and hard to control the result.

ntfshard avatar Jul 10 '22 20:07 ntfshard

FYI https://github.com/llvm/llvm-project/issues/56467

firewave avatar Jul 10 '22 22:07 firewave

FYI https://github.com/llvm/llvm-project/issues/56662 - so even with the successful detection of memset() there might still be issues.

firewave avatar Jul 21 '22 20:07 firewave

FYI https://github.com/llvm/llvm-project/issues/47852

firewave avatar Jul 25 '22 23:07 firewave

FYI https://github.com/llvm/llvm-project/commit/e079bf655832b2261faf6883984400a8c37b1baa

Will be the last one - sorry for spamming. Just highlighting what a compiler is doing internally for such calls so it's not always necessary to write the best possible code just to get the best possible performance.

So please consider filing these shortcomings upstream. As you need to follow up on these things it doesn't make sense that way for me to file these either since I did not encounter these issues. Thanks.

firewave avatar Jul 27 '22 15:07 firewave

Hmm, I little bit confused. I have a line for (size_t i = 0; i < 64*(sizeof(int) + 42); i++) { in testcase. And a token (, from here: 64*(sizeof has a tokType() == 17, which is a eExtendedOp enum entry. Not too much cases of usage this tokType in the project. I guess I need some clarification.

In token.cpp:145-146 there are lines:

else if (mStr.size() == 1 && mStr.find_first_of(",[]()?:") != std::string::npos)
    tokType(eExtendedOp);

Brackets has such set expression in token.cpp:170-171:

else if (mStr.size() == 1 && (mStr.find_first_of("{}") != std::string::npos || (mLink && mStr.find_first_of("<>") != std::string::npos)))
    tokType(eBracket);

Is it an expected behaviour? Did I miss something?

(Just trying to improve analysis of expressions inside condition of for loop)

ntfshard avatar Aug 15 '22 19:08 ntfshard

I am not 100% sure but probably it's expected behavior. does it cause a problem?

danmar avatar Aug 16 '22 20:08 danmar

I am not 100% sure but probably it's expected behavior. does it cause a problem?

In my case it's not a problem. But seems I can't figure out what is the eExtendedOp. I guess It's understandable for ? and :. And , in case of comma operator. It's something like https://en.cppreference.com/w/cpp/language/operator_other ? But what about []() symbols, why it's not a eBracket?

ntfshard avatar Aug 16 '22 20:08 ntfshard

But what about symbols, why it's not a eBracket?

ah yes. It would make sense that square brackets and round brackets are eBracket. Can we try to change this sometime after the release.

danmar avatar Aug 20 '22 19:08 danmar

But what about symbols, why it's not a eBracket?

ah yes. It would make sense that square brackets and round brackets are eBracket. Can we try to change this sometime after the release.

I guess I could check it next

ntfshard avatar Aug 20 '22 23:08 ntfshard

It seems problem was with a isConstFunctionCall function. Now isConstExpression works fine and I got rid of my own const checker.

ntfshard avatar Aug 26 '22 02:08 ntfshard

But what about symbols, why it's not a eBracket?

ah yes. It would make sense that square brackets and round brackets are eBracket. Can we try to change this sometime after the release.

I guess I could check it next

Also I tried to change and got strange fails in tests: TestUnusedVar::isRecordTypeWithoutSideEffects|cleanFunction|localvar15,56

ntfshard avatar Sep 03 '22 21:09 ntfshard

Also I tried to change and got strange fails in tests: TestUnusedVar::isRecordTypeWithoutSideEffects|cleanFunction|localvar15,56

ok thanks. Well we can try to remember to look into that someday.

danmar avatar Sep 04 '22 07:09 danmar