candle
candle copied to clipboard
[Tracking] FLUX T5 XXL model produces NaN when on CUDA and using F16
Perhaps we can use clamping, as per:
https://github.com/huggingface/transformers/blob/main/src/transformers/models/t5/modeling_t5.py#L748-L755
Using BF16 works on CUDA.
Interesting find: F16 fails (produces NaN) on an A100, but not an H100.