stanford_alpaca
stanford_alpaca copied to clipboard
Add inference code
Tested with own fine-tuned 7B alpaca model
python inference.py \
--model_name_or_path {model_path}
Instruction: Tell me about alpacas.
| 2499 | Al | -15.960 | 0.00%
| 29886 | p | -33.403 | 0.00%
| 562 | ac | -32.065 | 0.00%
| 294 | as | -24.586 | 0.00%
| 526 | are | -20.448 | 0.00%
| 263 | a | -17.845 | 0.00%
| 6606 | species | -16.602 | 0.00%
| 310 | of | -15.564 | 0.00%
| 4275 | South | -11.832 | 0.00%
| 3082 | American | -22.230 | 0.00%
| 3949 | cam | -12.354 | 0.00%
| 295 | el | -34.635 | 0.00%
| 333 | id | -19.849 | 0.00%
| 29892 | , | -20.313 | 0.00%
...
| 29889 | . | -25.931 | 0.00%
| 2 | </s> | -21.040 | 0.00%
Response: Alpacas are a species of South American camelid, related to the llama. They are smaller than llamas and typically have finer fiber. Alpacas are primarily bred for thei
r fiber, which can be spun into soft and luxurious yarns. They are also used for their meat, which is similar to that of a chicken. Alpacas are social animals and live in herds w
ith a dominant male leader.</s>
...
Largely influenced by https://github.com/kriskrisliu/stanford_alpaca/tree/krisliu
indices = sequences[:, cut_idx:] + beam_sequence_indices
RuntimeError: The size of tensor a (114) must match the size of tensor b (259) at non-singleton dimension 1
Have you meet error like it? @wade3han
No, I didn't encounter that error. Can you give me more context?
No, I didn't encounter that error. Can you give me more context?
just use :
instructions = [
"模仿鲁迅的风格, 吐槽一下最近食堂饭菜涨价",
]
same problem.
RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)
same for inferencing both llama-7b-hf and fine-tune model
Cool! The problem has been fixed.
Thanks for the code!
However, I had some problems when I run the code in my server with three 3090 GPUs with VRAM of 24GB*3.
I solved the error of out of memory by commenting out the line model.cuda().
Then I solved the error "Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0!" by commenting out the line num_beams=4,
.
I know model.cuda()
will set all model to the first GPU.
But what happend when I commenting out the line num_beams=4
? Why it can fix the error?