`fp8` and `bfloat16` support
NVidia actually has two variants of fp8 with different sizes of mantissa/exponent. bfloat16 is also unique. There's also TensorFloat32 which is really more like bfloat19. Perhaps it would make sense to have float<SizeOfMantissa, SizeOfExponent> generic type (hackery).
It looks like Cuda provides a few alternate floating point options, including bf16 and tf32.
This would have to be a Cuda only feature, as there is no equivalent in OpenCL 2.0.
We already have support for Half. Would we add support for BFloat16 and TensorFloat32 types? @m4rs-mt
@MoFtZ that's why it might make sense to have the generic-sized float type as I mentioned, with boolean guards. E.g. bool Accelerator.SupportsType(...) function, which would let user choose a different kernel.
@lostmsu @MoFtZ I think this makes sense to add given that NVIDIA GPUs can take serious advantage of these types. I wonder how we can make this happen in a convenient way. Let's get into more detail on Thursday in our talk-to-dev session.
Based on our last discussions, this is more broadly related to adding support for the Cuda WMMA (Warp Level Matrix Multiply-Accumulate Instructions); adding support for the fp8 and bfloat16 types is not very useful without WMMA support.