alpaca-lora resume_from

resume_from_checkpoint

Open anyili opened this issue 1 year ago • 2 comments

for line https://github.com/tloen/alpaca-lora/blob/main/finetune.py#L273 why not set resume_from_checkpoint=True, it seems that will load state correctly as well as LORA weights. Why do we still need https://github.com/tloen/alpaca-lora/blob/main/finetune.py#L191

Thanks!

Apr 19 '23 21:04 anyili

If I setup resume_from_checkpoint=saved_model/checkpoint-1000, it will throw exception like ValueError: DistributedDataParallel device_ids and output_device arguments only work with single-device/multiple-device GPU modules or CPU modules, but got device_ids [0], output_device 0, and module parameters {device(type='cuda', index=0), device(type='cuda', index=1), device(type='cuda', index=2), device(type='cuda', index=3)}.

Apr 19 '23 21:04 anyili

adapters_weights = torch.load(checkpoint_name,map_location='cuda:0')

Apr 20 '23 04:04 lywinged

alpaca-lora alpaca-lora copied to clipboard

resume_from_checkpoint

alpaca-lora
alpaca-lora copied to clipboard