libcudacxx Removed block of complex multiply checks

Addresses issue #306 pending verification that none of these checks break standards conformance. As noted in the linked issue, a large performance penalty can happen in certain applications that heavily use complex multiplies by introducing extra branches and instructions. This PR only performs the multiple without checking for NaN.

Aug 26 '22 14:08 cliffburdick

Thanks Cliff! I think this is a reasonable change, but I'll need to defer to the language lawyers to say for sure.

@ericniebler @brycelelbach could you confirm if this is a conforming change?

Aug 26 '22 19:08 jrhemstad

I've never been much of a floating point expert. Without doing some research, I can't tell you whether this is conforming or not.

Aug 27 '22 01:08 ericniebler

Any thoughts on wrapping the removed lines with this:

NV_IF_TARGET(
   NV_IS_HOST,
.... checks ....
)

This way we can keep the slow path intact on CPU? There could be libcxx tests that expect some result from these checks.

I am content to merge as-is though, I'll run tests and see if anything barfs.

Aug 27 '22 19:08 wmaxey

Any thoughts on wrapping the removed lines with this:
NV_IF_TARGET(
   NV_IS_HOST,
.... checks ....
)
This way we can keep the slow path intact on CPU? There could be libcxx tests that expect some result from these checks.

I am content to merge as-is though, I'll run tests and see if anything barfs.

Is there precedent for doing different things on the host and device? Would it be better to have a version in cuda::complex that removes all the checks (not just for multiply).

Aug 27 '22 20:08 cliffburdick

Standard seems to indicate it is conforming and the checks are unnecessary:

"If the result of a function is not mathematically defined or not in the range of representable values for its type, the behavior is undefined" https://eel.is/c++draft/complex.numbers#general-3.sentence-1

The checks should be an opt-in as the impact of doing those checks impact a lot more people negatively than not doing them does. This would also match libstdc++ behavior.

There are other checks in other operators that we may also want to remove.

Aug 30 '22 21:08 luitjens

Hey @cliffburdick, after internal discussion, I don't think we are ready to fully commit to removing these checks by default.

I think as a short-term solution, we could guard these checks behind a macro that allows you to opt-out of them. Maybe something like __LIBCUDACXX_NO_IEC_559_COMPLEX_.

Aug 30 '22 21:08 jrhemstad

Hey @cliffburdick, after internal discussion, I don't think we are ready to fully commit to removing these checks by default.

I think as a short-term solution, we could guard these checks behind a macro that allows you to opt-out of them. Maybe something like __LIBCUDACXX_NO_IEC_559_COMPLEX_.

Thanks. Do we need to wait for consensus on the macro name and/or the way we set it?

Sep 02 '22 21:09 cliffburdick

As a side note, neither of the other standard libraries does this. So we would be in good company if we removed it. Also none of the libc++ standard library tests of complex rely on this behavior.

Also from the definition of the operators here https://eel.is/c++draft/complex.numbers#complex.member.ops-13 I do not see why that would be necessary

Dec 07 '22 13:12 miscco

As a side note, neither of the other standard libraries does this. So we would be in good company if we removed it. Also none of the libc++ standard library tests of complex rely on this behavior.

Also from the definition of the operators here https://eel.is/c++draft/complex.numbers#complex.member.ops-13 I do not see why that would be necessary

I explained a bit of the historical context of why I think this logic is in libc++: https://github.com/NVIDIA/libcudacxx/issues/306#issuecomment-1227812190

Dec 07 '22 16:12 jrhemstad

That explains where it came from but doesn't explain why we keep it in.

Dec 07 '22 16:12 luitjens

Thinking out loud here: there may be situations where it is desirable to have access to both the "safe" multiply and the optimized one in the same compilation unit. E.g., for 99% of their execution they may desire the optimized variant, but there may be sensitive parts that require careful NaN handling.

Could the body of operator* be moved into a new sub function, say mul or whatever, which is templated as to whether to include the NaN / Infinity preservation. The feature macro around operator* would then be used to set the behaviour of whether this boolean template option is enabled or not when it calling the sub function, but the user would still be able to call the underlying subfunction directly if needed.

Feb 23 '23 17:02 maddyscientist

Hi, I have opened a PR #397 for the implementation of constexpr complex

I added a feature flag, that lets you remove the expensive denormal handling but keeps the existing behavior by default. I will close this PR in favor of the other one. If you have any further comments please feels free to comment on the other PR

Mar 17 '23 09:03 miscco

libcudacxx libcudacxx copied to clipboard

Removed block of complex multiply checks

libcudacxx
libcudacxx copied to clipboard