math stan::math and std::complex

Description

Currently, Stan's complex number support is entirely built on std::complex, including autodiff, which uses std::complex<stan::math::var>.

This is, unfortunately, unspecified behavior in the C++ spec [26.4.2]:

The effect of instantiating the template complex for any type other than float, double, or long double is unspecified. The specializations complex<float>, complex<double>, and complex<long double> are literal types

For a reminder on what "unspecified behavior" means:

unspecified behavior - the behavior of the program varies between implementations, and the conforming implementation is not required to document the effects of each behavior. Each unspecified behavior results in one of a set of valid results.

Essentially, unspecified behavior is the same as "implementation-defined behavior" but without the requirement that implementations document what they are doing. This is also often taken to mean there are no backwards compatibility guarantees on any specific unspecified behavior.

This creates both a maintenance burden (each new libstdc++/libc++ release can create arbitrary amounts of work for our developers) and a stability hazard (the idea that "Stan X.Y will continue to work a year from now, without needing to update to Stan X.Z" is false as things stand today)

Problems

Recent versions of clang/libstdc++ have made changes which they are fully within their rights to do by the spec, but have broken Stan builds.

In libstdc++16, they changed the definition of log(complex) from complex<T>(log(abs(x)), arg(x)); to complex<T>(std::log(std::abs(x)), std::arg(x));. This broke argument dependent lookup for this function. A similar change broke operator* for our complex types. This lead to https://github.com/stan-dev/cmdstan/issues/1158, which was the reason we needed a 2.32.1 release. @andrjohns provided the fix in https://github.com/stan-dev/math/pull/2892
In libstdc++17, a similar change was made to fabs, which necessitated to https://github.com/stan-dev/math/pull/2991
In libstdc++19, the internal structure of pow was rewritten such that several overloads lead to a static assert failing if the type passed was not arithmetic: #3106

What to do

This is less clear to me.

Option 1 - walk on egg shells

So far, all of the issues that have arisen from this have been due to argument dependent lookup breaking for these types. We can fix that by being much more explicit, as we did in https://github.com/stan-dev/math/pull/2892 and https://github.com/stan-dev/math/pull/2991. This requires auditing the existing usages, which probably requires a fair amount of C++ expertise to understand how the calls are being resolved.

Option 2 - our own type

We could rather trivially define our own stan::math::complex<T> type. We could make it assignable from std::complex<double>, and I think be off to the races? I believe the complex linear algebra we use in Eigen all support a template argument for the complex type, rather than assuming std::complex. This would require a fair amount of boilerplate to actually do any math on it, and in the case of double we may lose out on some of the optimizations that having the type built in to the language grants, but we'd own it.

Jan 19 '24 15:01 WardBrian

@WardBrian, thanks for writing this down! I think relying on undefined behavior according to the C++ specification leads directly to this.

Thank you for laying out the two options. They seem like the reasonable set of options; I can't think of another option to consider.

Tradeoffs Between Option 1 and 2

In general, I think Option 1 is better than Option 2 only under these conditions:

the footprint of the undefined behavior is small (in the sense that we can support it as C++ compilers and libraries change their stance on the undefined behavior)
how we think it should work is consistent
we can set up tests for the undefined behavior that will trigger when behavior we're relying on changes
we are able to selectively override behavior to what we need

I think Option 2 is better than Option 1 here, even if we have to write tests for almost all the behavior we rely on.

Jan 19 '24 16:01 syclik

I prefer option 2, since it eventually gets us to an island of stability. The downside is it kind of needs to be done monolithically/all at once, which is a lot of work

Option 1 seems like we could eventually reach some kind of stable state where we're not using any ADL at all (we're testing against new clang/gcc versions to catch breaks early), but then the compilers could design to break something else too.

Jan 19 '24 16:01 WardBrian