text-generation-inference icon indicating copy to clipboard operation
text-generation-inference copied to clipboard

OpenLLama Orca mini model Expected (head_size % 8 == 0) && (head_size <= 128) to be true

Open llmlover opened this issue 2 years ago • 0 comments

System Info

docker image: ghcr.io/huggingface/text-generation-inference:0.8

Information

  • [X] Docker
  • [ ] The CLI directly

Tasks

  • [X] An officially supported command
  • [ ] My own modifications

Reproduction

Hi, thank you for the repo. Regarding the new orca mini model (psmathur/orca_mini_3b) When using an OpenLLama 3B model, I get this error for a normal and for a streaming request:

{"error":"Request failed during generation: Server error: Expected (head_size % 8 == 0) && (head_size <= 128) to be true, but got false.  (Could this error message be improved?  If so, please report an enhancement request to PyTorch.)","error_type":"generation"}

Thank you for hopefully assessing the bug.

Expected behavior

TGI should output a generation for the 3B model.

llmlover avatar Jul 01 '23 16:07 llmlover