text-generation-inference
text-generation-inference copied to clipboard
OpenLLama Orca mini model Expected (head_size % 8 == 0) && (head_size <= 128) to be true
System Info
docker image: ghcr.io/huggingface/text-generation-inference:0.8
Information
- [X] Docker
- [ ] The CLI directly
Tasks
- [X] An officially supported command
- [ ] My own modifications
Reproduction
Hi, thank you for the repo.
Regarding the new orca mini model (psmathur/orca_mini_3b)
When using an OpenLLama 3B model, I get this error for a normal and for a streaming request:
{"error":"Request failed during generation: Server error: Expected (head_size % 8 == 0) && (head_size <= 128) to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)","error_type":"generation"}
Thank you for hopefully assessing the bug.
Expected behavior
TGI should output a generation for the 3B model.