mistral.rs icon indicating copy to clipboard operation
mistral.rs copied to clipboard

Server crashes while processing 2 concurrent requests

Open LLukas22 opened this issue 9 months ago • 1 comments

Describe the bug If two requests are sent to the server at roughly the same time, it will start to respond to both requests and then crash with the following error message:

ERROR mistralrs_core::engine: completion - Model failed with error: ShapeMismatchCat { dim: 0, first_shape: [2, 32, 111, 96], n: 2, nth_shape: [4, 32, 1, 96] }

used docker-compose:

version: '3.8'

services:
  text-generation:
    image: ghcr.io/ericlbuehler/mistral.rs:cuda-89-latest
    ports:
        - 12005:80
    volumes:
        - /data/hf-cache:/data:z
    command: --isq Q4K plain -m microsoft/Phi-3-mini-128k-instruct -a phi3
    environment:
       - HUGGING_FACE_HUB_TOKEN=[TOKEN]
       - KEEP_ALIVE_INTERVAL=100
    healthcheck:
      test: curl --fail http://localhost/health || exit 1
      interval: 30s
      retries: 5
      start_period: 300s
      timeout: 10s
    restart: unless-stopped
    deploy:
      resources:
        reservations:
          devices:
          - driver: nvidia
            capabilities: [gpu]
            count: all

This could also be an error with phi-3 i have to do some further testing.

Latest commit 4ffe68d

LLukas22 avatar Apr 30 '24 11:04 LLukas22

This should not be a problem with the phi3 model specifically. I'll look into what could be the cause.

EricLBuehler avatar Apr 30 '24 12:04 EricLBuehler

I was able to reproduce the error by running the following in quick succession.

curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer EMPTY" \
-d '{
"model": "",
"messages": [
{
    "role": "system",
    "content": "You are Mistral.rs, an AI assistant."
},
{
    "role": "user",
    "content": "Write a story about Rust error handling."
}
]
}' &

EricLBuehler avatar May 07 '24 21:05 EricLBuehler