text-generation-inference OpenLLama Orca mini model Expected (head_size % 8 == 0) && (head

OpenLLama Orca mini model Expected (head_size % 8 == 0) && (head_size <= 128) to be true

Open llmlover opened this issue 2 years ago • 0 comments

System Info

docker image: ghcr.io/huggingface/text-generation-inference:0.8

Information

[X] Docker
[ ] The CLI directly

Tasks

[X] An officially supported command
[ ] My own modifications

Reproduction

Hi, thank you for the repo. Regarding the new orca mini model (psmathur/orca_mini_3b) When using an OpenLLama 3B model, I get this error for a normal and for a streaming request:

{"error":"Request failed during generation: Server error: Expected (head_size % 8 == 0) && (head_size <= 128) to be true, but got false.  (Could this error message be improved?  If so, please report an enhancement request to PyTorch.)","error_type":"generation"}

Thank you for hopefully assessing the bug.

Expected behavior

TGI should output a generation for the 3B model.

Jul 01 '23 16:07 llmlover

text-generation-inference text-generation-inference copied to clipboard

OpenLLama Orca mini model Expected (head_size % 8 == 0) && (head_size <= 128) to be true

System Info

Information

Tasks

Reproduction

Expected behavior

text-generation-inference
text-generation-inference copied to clipboard