litgpt
litgpt copied to clipboard
Add slow interconnect warning
Lots of users asked/raised issues whether there is a bug because multi-GPU training can be slower than single-GPU training. This is not due to a LitGPT bug but because machines with slow GPU connections were used.
This adds a warning if there is a slow GPU interconnect and suggests to use a different machine for multi-GPU training.
CC @apaz-cli
Fixes #1369 Fixes #607 Fixes #1581