The draft of simulated double.
This draft provides the custom double on device side, which might reduce a bit work if the vendor does not provide simulated double and remove the double precision core.
do not need to review this pr unless we have decided to investigate this direction further.
This can be compiled with cuda 12.6.
It mainly focuses on the interface and forward the operation internally to double precision.
This PR disables the conversion custom_double -> double to ensure there is no double operation directly in the kernels.
Unfortunately, the other direction double -> custom_double (via static_cast), but the implicit conversion should be avoided.
The requirement is from thrust::complex that needs T(1.0)/s which require this constructor from double.
If the vendor delete the double precision core without the simulation way, some functions might not be available due to hardware limit, but the others can be simulated in software side by more operations or allowing reinterpret.
Somethings needs to implement (simulate):
- __shfl_xor_sync: it can be replaced by cooperative group or by casting.
- load/store: by reinterpret to 64bit int?
- math operation (+-*/), comparison, sqrt
- likely thrust::complex<custom_double> (if they can change it to T(1.0f), we might not have an issue) and corresponding abs and sqrt
If the application does not need the double precision at all, we have done the same things for dpcpp on the Intel GPU without double precision support. We only need to reapply it again to other backend, which mainly takes care of some accidental usage of double.
Error: The following files need to be formatted:
core/config/property_tree.cpp
core/test/config/property_tree.cpp
include/ginkgo/core/config/property_tree.hpp
You can find a formatting patch under Artifacts here or run format! if you have write access to Ginkgo