stanford_alpaca icon indicating copy to clipboard operation
stanford_alpaca copied to clipboard

Add inference code

Open wade3han opened this issue 1 year ago • 7 comments

Tested with own fine-tuned 7B alpaca model

python inference.py \
    --model_name_or_path {model_path}
Instruction: Tell me about alpacas.
|  2499 | Al       | -15.960 | 0.00%
| 29886 | p        | -33.403 | 0.00%
|   562 | ac       | -32.065 | 0.00%
|   294 | as       | -24.586 | 0.00%
|   526 | are      | -20.448 | 0.00%
|   263 | a        | -17.845 | 0.00%
|  6606 | species  | -16.602 | 0.00%
|   310 | of       | -15.564 | 0.00%
|  4275 | South    | -11.832 | 0.00%
|  3082 | American | -22.230 | 0.00%
|  3949 | cam      | -12.354 | 0.00%
|   295 | el       | -34.635 | 0.00%
|   333 | id       | -19.849 | 0.00%
| 29892 | ,        | -20.313 | 0.00%
...
| 29889 | .        | -25.931 | 0.00%
|     2 | </s>     | -21.040 | 0.00%
Response:  Alpacas are a species of South American camelid, related to the llama. They are smaller than llamas and typically have finer fiber. Alpacas are primarily bred for thei
r fiber, which can be spun into soft and luxurious yarns. They are also used for their meat, which is similar to that of a chicken. Alpacas are social animals and live in herds w
ith a dominant male leader.</s>

...

Largely influenced by https://github.com/kriskrisliu/stanford_alpaca/tree/krisliu

wade3han avatar Apr 10 '23 05:04 wade3han

    indices = sequences[:, cut_idx:] + beam_sequence_indices
RuntimeError: The size of tensor a (114) must match the size of tensor b (259) at non-singleton dimension 1

Have you meet error like it? @wade3han

MrRace avatar Apr 10 '23 09:04 MrRace

No, I didn't encounter that error. Can you give me more context?

wade3han avatar Apr 10 '23 10:04 wade3han

No, I didn't encounter that error. Can you give me more context?

just use :

instructions = [
        "模仿鲁迅的风格, 吐槽一下最近食堂饭菜涨价",
    ]

MrRace avatar Apr 10 '23 16:04 MrRace

same problem.

diichen avatar Apr 13 '23 12:04 diichen

RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)

same for inferencing both llama-7b-hf and fine-tune model

magnificent1208 avatar Apr 14 '23 07:04 magnificent1208

Cool! The problem has been fixed.

diichen avatar Apr 14 '23 08:04 diichen

Thanks for the code!

However, I had some problems when I run the code in my server with three 3090 GPUs with VRAM of 24GB*3. I solved the error of out of memory by commenting out the line model.cuda(). Then I solved the error "Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0!" by commenting out the line num_beams=4, .

I know model.cuda() will set all model to the first GPU. But what happend when I commenting out the line num_beams=4 ? Why it can fix the error?

BaoBaoGitHub avatar Jul 20 '23 08:07 BaoBaoGitHub