Decide what to do about 16bit weights trained with mixed precision

Open carmocca opened this issue 1 year ago • 0 comments

Our training scripts select mixed precision by default (16-mixed or bf16-mixed). Many of the HF pretrained weights come in 16bit (float16 or bfloat16).

Since the weights are already in this dtype, it's not useful to do mixed precision training. We can do one of two things about it:

a) Cast the weights to fp32 in this case so that mixed precision training does something useful b) Raise an exception saying that this configuration is not useful.

Note that this change needs to be done in PyTorch Lightning and Fabric, but I'm opening the issue here because this is a common problem in LitGPT and because we would need to add code to LitGPT to avoid b) if we choose it

Apr 04 '24 17:04 carmocca