candle [Tracking] FLUX T5 XXL model produces NaN when on CUDA and using F16

[Tracking] FLUX T5 XXL model produces NaN when on CUDA and using F16

Open EricLBuehler opened this issue 1 year ago • 1 comments

Perhaps we can use clamping, as per:

https://github.com/huggingface/transformers/blob/main/src/transformers/models/t5/modeling_t5.py#L748-L755

Using BF16 works on CUDA.

Sep 16 '24 15:09 EricLBuehler

Interesting find: F16 fails (produces NaN) on an A100, but not an H100.

Sep 17 '24 14:09 EricLBuehler