CUDALibrarySamples
CUDALibrarySamples copied to clipboard
[cuTENSOR] Automatically enable/disable TensorFloat32
(Thanks to @springer13 for making your work on a PyTorch cuTENSOR wrapper public!)
Currently, the Python cuTENSOR wrapper always uses TensorFloat32 as the compute dtype for 32-bit float tensors, which is unsupported on non-Ampere GPUs. This PR uses the PyTorch and Tensorflow configuration options for TensorFloat32 (which autodetect Ampere) to set the compute dtype to normal 32-bit float when tf32 is not supported.