Taiwan-LLM icon indicating copy to clipboard operation
Taiwan-LLM copied to clipboard

請問該如何解決 accelerate launch (multi-gpu) 下 torch.distributed.elastic.multiprocessing.errors.ChildFailedError 問題?

Open chyiin opened this issue 7 months ago • 0 comments

torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

axolotl.cli.train FAILED

chyiin avatar Jul 09 '24 01:07 chyiin