stan::math and std::complex
Description
Currently, Stan's complex number support is entirely built on std::complex, including autodiff, which
uses std::complex<stan::math::var>.
This is, unfortunately, unspecified behavior in the C++ spec [26.4.2]:
The effect of instantiating the template
complexfor any type other thanfloat,double, orlong doubleis unspecified. The specializationscomplex<float>,complex<double>, andcomplex<long double>are literal types
For a reminder on what "unspecified behavior" means:
unspecified behavior - the behavior of the program varies between implementations, and the conforming implementation is not required to document the effects of each behavior. Each unspecified behavior results in one of a set of valid results.
Essentially, unspecified behavior is the same as "implementation-defined behavior" but without the requirement that implementations document what they are doing. This is also often taken to mean there are no backwards compatibility guarantees on any specific unspecified behavior.
This creates both a maintenance burden (each new libstdc++/libc++ release can create arbitrary amounts of work for our developers) and a stability hazard (the idea that "Stan X.Y will continue to work a year from now, without needing to update to Stan X.Z" is false as things stand today)
Problems
Recent versions of clang/libstdc++ have made changes which they are fully within their rights to do by the spec, but have broken Stan builds.
- In libstdc++16, they changed the definition of
log(complex)fromcomplex<T>(log(abs(x)), arg(x));tocomplex<T>(std::log(std::abs(x)), std::arg(x));. This broke argument dependent lookup for this function. A similar change brokeoperator*for our complex types. This lead to https://github.com/stan-dev/cmdstan/issues/1158, which was the reason we needed a 2.32.1 release. @andrjohns provided the fix in https://github.com/stan-dev/math/pull/2892 - In libstdc++17, a similar change was made to
fabs, which necessitated to https://github.com/stan-dev/math/pull/2991 - In libstdc++19, the internal structure of
powwas rewritten such that several overloads lead to a static assert failing if the type passed was not arithmetic: #3106
What to do
This is less clear to me.
Option 1 - walk on egg shells
So far, all of the issues that have arisen from this have been due to argument dependent lookup breaking for these types. We can fix that by being much more explicit, as we did in https://github.com/stan-dev/math/pull/2892 and https://github.com/stan-dev/math/pull/2991. This requires auditing the existing usages, which probably requires a fair amount of C++ expertise to understand how the calls are being resolved.
Option 2 - our own type
We could rather trivially define our own stan::math::complex<T> type. We could make it assignable from std::complex<double>, and I think be off to the races? I believe the complex linear algebra we use in Eigen all support a template argument for the complex type, rather than assuming std::complex.
This would require a fair amount of boilerplate to actually do any math on it, and in the case of double we may lose out on some of the optimizations that having the type built in to the language grants, but we'd own it.
@WardBrian, thanks for writing this down! I think relying on undefined behavior according to the C++ specification leads directly to this.
Thank you for laying out the two options. They seem like the reasonable set of options; I can't think of another option to consider.
Tradeoffs Between Option 1 and 2
In general, I think Option 1 is better than Option 2 only under these conditions:
- the footprint of the undefined behavior is small (in the sense that we can support it as C++ compilers and libraries change their stance on the undefined behavior)
- how we think it should work is consistent
- we can set up tests for the undefined behavior that will trigger when behavior we're relying on changes
- we are able to selectively override behavior to what we need
I think Option 2 is better than Option 1 here, even if we have to write tests for almost all the behavior we rely on.
I prefer option 2, since it eventually gets us to an island of stability. The downside is it kind of needs to be done monolithically/all at once, which is a lot of work
Option 1 seems like we could eventually reach some kind of stable state where we're not using any ADL at all (we're testing against new clang/gcc versions to catch breaks early), but then the compilers could design to break something else too.