libcudacxx
libcudacxx copied to clipboard
Removed block of complex multiply checks
Addresses issue #306 pending verification that none of these checks break standards conformance. As noted in the linked issue, a large performance penalty can happen in certain applications that heavily use complex multiplies by introducing extra branches and instructions. This PR only performs the multiple without checking for NaN.
Thanks Cliff! I think this is a reasonable change, but I'll need to defer to the language lawyers to say for sure.
@ericniebler @brycelelbach could you confirm if this is a conforming change?
I've never been much of a floating point expert. Without doing some research, I can't tell you whether this is conforming or not.
Any thoughts on wrapping the removed lines with this:
NV_IF_TARGET(
NV_IS_HOST,
.... checks ....
)
This way we can keep the slow path intact on CPU? There could be libcxx tests that expect some result from these checks.
I am content to merge as-is though, I'll run tests and see if anything barfs.
Any thoughts on wrapping the removed lines with this:
NV_IF_TARGET( NV_IS_HOST, .... checks .... )This way we can keep the slow path intact on CPU? There could be libcxx tests that expect some result from these checks.
I am content to merge as-is though, I'll run tests and see if anything barfs.
Is there precedent for doing different things on the host and device? Would it be better to have a version in cuda::complex that removes all the checks (not just for multiply).
Standard seems to indicate it is conforming and the checks are unnecessary:
"If the result of a function is not mathematically defined or not in the range of representable values for its type, the behavior is undefined" https://eel.is/c++draft/complex.numbers#general-3.sentence-1
The checks should be an opt-in as the impact of doing those checks impact a lot more people negatively than not doing them does. This would also match libstdc++ behavior.
There are other checks in other operators that we may also want to remove.
Hey @cliffburdick, after internal discussion, I don't think we are ready to fully commit to removing these checks by default.
I think as a short-term solution, we could guard these checks behind a macro that allows you to opt-out of them. Maybe something like __LIBCUDACXX_NO_IEC_559_COMPLEX_.
Hey @cliffburdick, after internal discussion, I don't think we are ready to fully commit to removing these checks by default.
I think as a short-term solution, we could guard these checks behind a macro that allows you to opt-out of them. Maybe something like
__LIBCUDACXX_NO_IEC_559_COMPLEX_.
Thanks. Do we need to wait for consensus on the macro name and/or the way we set it?
As a side note, neither of the other standard libraries does this. So we would be in good company if we removed it. Also none of the libc++ standard library tests of complex rely on this behavior.
Also from the definition of the operators here https://eel.is/c++draft/complex.numbers#complex.member.ops-13 I do not see why that would be necessary
As a side note, neither of the other standard libraries does this. So we would be in good company if we removed it. Also none of the libc++ standard library tests of complex rely on this behavior.
Also from the definition of the operators here https://eel.is/c++draft/complex.numbers#complex.member.ops-13 I do not see why that would be necessary
I explained a bit of the historical context of why I think this logic is in libc++: https://github.com/NVIDIA/libcudacxx/issues/306#issuecomment-1227812190
That explains where it came from but doesn't explain why we keep it in.
Thinking out loud here: there may be situations where it is desirable to have access to both the "safe" multiply and the optimized one in the same compilation unit. E.g., for 99% of their execution they may desire the optimized variant, but there may be sensitive parts that require careful NaN handling.
Could the body of operator* be moved into a new sub function, say mul or whatever, which is templated as to whether to include the NaN / Infinity preservation. The feature macro around operator* would then be used to set the behaviour of whether this boolean template option is enabled or not when it calling the sub function, but the user would still be able to call the underlying subfunction directly if needed.
Hi, I have opened a PR #397 for the implementation of constexpr complex
I added a feature flag, that lets you remove the expensive denormal handling but keeps the existing behavior by default. I will close this PR in favor of the other one. If you have any further comments please feels free to comment on the other PR