stanford_alpaca
stanford_alpaca copied to clipboard
How to inference after finetuning
How to inference after finetuning?
check out this PR: https://github.com/tatsu-lab/stanford_alpaca/pull/199/files
@FruVirus how to convert the fine tuned model to a form load-able by the script? what is the command to run?
Thanks for the link!
However, I had some problems when I run the code in my server with three 3090 GPUs with VRAM of 24GB*3.
I solved the error of out of memory by commenting out the line model.cuda().
Then I solved the error "Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0!" by commenting out the line num_beams=4, .
I know model.cuda() will set all model to the first GPU.
But what happend when I commenting out the line num_beams=4 ? Why it can fix the error?
It seems like model is loaded to the device during
transformers.AutoModelForCausalLM.from_pretrained
and num_beams error is caused by 'inf' 'nan' or ele<0 in my error info, I have no idead about that