stanford_alpaca
stanford_alpaca copied to clipboard
Is anyone using a single A100 80GB for training?
I also tryed to finetune this model using a single A100 gpu, but failed! I met the error: "ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9) local_rank: 0 (pid: 564788) of binary". I guess we may need to change the code from distributed data parallel mode to single gpu mode.
Not enough GRAM