cmssw icon indicating copy to clipboard operation
cmssw copied to clipboard

Shall we fix all those inefficient (improper?) use of "pow"?

Open VinInn opened this issue 11 months ago • 7 comments

most of our use of pow is inefficient if not improper: https://github.com/search?q=repo%3Acms-sw%2Fcmssw+pow%28+language%3AC%2B%2B&type=code&l=C%2B%2B&p=5

  1. pow(x,2) : no the compiler will not substitute with x*x: we need to introduce our own inline function "square"
  2. pow(x,0.5) pow(x,1/3) result correct (yes even for negative x) still a direct call to sqrt and cbrt may be more efficient
  3. pow(2,j), pow(10,j), pow (-1, j): no comment
  4. pow(x,j) in polynomial: Horner shall be used
  5. pow(a,y) where a is a literal: exp(y*log(a)) should be faster provided that the compiler makes log(a) constexpr

shall we try to fix all or them or wait that some go above the threshold of our performance-radar?

VinInn avatar Mar 15 '24 10:03 VinInn

cms-bot internal usage

cmsbuild avatar Mar 15 '24 10:03 cmsbuild

A new Issue was created by @VinInn.

@smuzaffar, @antoniovilela, @Dr15Jones, @sextonkennedy, @rappoccio, @makortel can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

cmsbuild avatar Mar 15 '24 10:03 cmsbuild

pow(x,2) : no the compiler will not substitute with x*x: we need to introduce our own inline function "square"

depends on the optimization flags according to some old posts

slava77 avatar Mar 15 '24 13:03 slava77

pow(x,2) : no the compiler will not substitute with x*x: we need to introduce our own inline function "square"

depends on the optimization flags according to some old posts

At the very least we should be encouraging using std::pow(), std::sqrt() etc., which will let the compiler use the appropriate type-matched inlined function. This may invoke a compiler builtin implementation, and should avoid any unnecessary type promotions to double (esp. important consideration for vectorization). I'm not convinced it's worth a campaign w/o evidence that these are performance bottlenecks.

dan131riley avatar Mar 15 '24 16:03 dan131riley

indeed. but "std::pow(x,2)+std::pow(y,2);" is promoted to double... (as any literal non explicitely float) https://godbolt.org/z/ab1Koqvd8

VinInn avatar Mar 15 '24 18:03 VinInn

Interesting that more complex expression lead to such differences (although maybe not entirely surprising either), I'd say that discovery increases the motivation to do something. Some evidence on the cost would still be nice.

makortel avatar Mar 15 '24 19:03 makortel

Hadn't thought of that. Will also note (one of my hobby horses) that "std::pow(x, 1.f/3.f);" is very different from "std::pow(x, 1./3.);". The former jumps straight to powf(), while the latter converts everything to double, calls pow(), and converts the result back to float.

dan131riley avatar Mar 15 '24 20:03 dan131riley

according to experts: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114363

float z = pow(x,2) is immediately converted to z=x*x because it is guaranteed to produce the same results of invoking the double precision function. all other cases not, so the optimization cannot be applied.

VinInn avatar Mar 16 '24 15:03 VinInn

Tying back to my previous comment, "std::pow(x,2.f)+std::pow(y,2.f);" avoids the type promotion, and compiles down to two instructions if FMA is available. Fun, wasn't expecting that either.

dan131riley avatar Mar 16 '24 15:03 dan131riley

@dan131riley yeah, we shall just teach people to use 2.f ...

VinInn avatar Mar 16 '24 15:03 VinInn

@dan131riley yeah, we shall just teach people to use 2.f ...

Oh, I agree, teaching average physicists these corner cases is a non-starter. I mostly have the MkFit developers on board, but that's an experienced bunch. Just thinking about what we can do...

dan131riley avatar Mar 16 '24 15:03 dan131riley