litgpt icon indicating copy to clipboard operation
litgpt copied to clipboard

Auto precision

Open rasbt opened this issue 1 year ago • 2 comments

One small issue I see with the current config files is that we are using bf16-true. This is recommended in my opinion, but certain hardware doesn't support it. In this case we could recommend using --precision 16-true from the command line. However, maybe we could have an "auto" argument in the config files similar to Ollama. I think we currently already support that via https://github.com/Lightning-AI/litgpt/blob/f241d94df59d82b2017bfdcd3800ac8779eb45f5/lit_gpt/utils.py#L284

We would just need to say null in the config file and then maybe specify that bf16-true is used when supported and otherwise 16-true?

rasbt avatar Mar 13 '24 17:03 rasbt

This will be an issue for reproducibility. It's not guaranteed that the training will give the same results or be stable. I recommend running first to show that it converges well.

To avoid having an ambiguous "null" value, we could also error and tell the user to explicitly select precision=16-true if bfloat is not supported.

awaelchli avatar Mar 14 '24 01:03 awaelchli

That's fair, we would have to run the script with both fp16 and bf16. But this is not that different from saying "if your GPU does not support --precision bf16-true run the script with --precision 16-true".

Maybe we should add something like:

"If your GPU is not compatible with --precision bf16-true, you can execute the script using --precision 16-true instead. However, be aware that this adjustment may lead to a decline in performance and the outcomes may vary from the reported results."

rasbt avatar Mar 14 '24 12:03 rasbt