mistral.rs
mistral.rs copied to clipboard
Server crashes while processing 2 concurrent requests
Describe the bug If two requests are sent to the server at roughly the same time, it will start to respond to both requests and then crash with the following error message:
ERROR mistralrs_core::engine: completion - Model failed with error: ShapeMismatchCat { dim: 0, first_shape: [2, 32, 111, 96], n: 2, nth_shape: [4, 32, 1, 96] }
used docker-compose:
version: '3.8'
services:
text-generation:
image: ghcr.io/ericlbuehler/mistral.rs:cuda-89-latest
ports:
- 12005:80
volumes:
- /data/hf-cache:/data:z
command: --isq Q4K plain -m microsoft/Phi-3-mini-128k-instruct -a phi3
environment:
- HUGGING_FACE_HUB_TOKEN=[TOKEN]
- KEEP_ALIVE_INTERVAL=100
healthcheck:
test: curl --fail http://localhost/health || exit 1
interval: 30s
retries: 5
start_period: 300s
timeout: 10s
restart: unless-stopped
deploy:
resources:
reservations:
devices:
- driver: nvidia
capabilities: [gpu]
count: all
This could also be an error with phi-3 i have to do some further testing.
Latest commit 4ffe68d
This should not be a problem with the phi3 model specifically. I'll look into what could be the cause.
I was able to reproduce the error by running the following in quick succession.
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer EMPTY" \
-d '{
"model": "",
"messages": [
{
"role": "system",
"content": "You are Mistral.rs, an AI assistant."
},
{
"role": "user",
"content": "Write a story about Rust error handling."
}
]
}' &