stanford_alpaca icon indicating copy to clipboard operation
stanford_alpaca copied to clipboard

Is anyone using a single A100 80GB for training?

Open Ahtesham00 opened this issue 1 year ago • 2 comments

Ahtesham00 avatar Apr 12 '23 20:04 Ahtesham00

I also tryed to finetune this model using a single A100 gpu, but failed! I met the error: "ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9) local_rank: 0 (pid: 564788) of binary". I guess we may need to change the code from distributed data parallel mode to single gpu mode.

Somezak1 avatar Apr 21 '23 06:04 Somezak1

Not enough GRAM

zhsu-private avatar Apr 25 '23 10:04 zhsu-private