litgpt icon indicating copy to clipboard operation
litgpt copied to clipboard

Add slow interconnect warning

Open rasbt opened this issue 7 months ago • 4 comments

Lots of users asked/raised issues whether there is a bug because multi-GPU training can be slower than single-GPU training. This is not due to a LitGPT bug but because machines with slow GPU connections were used.

This adds a warning if there is a slow GPU interconnect and suggests to use a different machine for multi-GPU training.

CC @apaz-cli

Fixes #1369 Fixes #607 Fixes #1581

rasbt avatar Jul 12 '24 18:07 rasbt