FastChat icon indicating copy to clipboard operation
FastChat copied to clipboard

Error in Gemma 2 using model_worker (probably an error in conversation.py)

Open vikrantrathore opened this issue 1 year ago • 3 comments
trafficstars

When using model_worker with transformers to run Gemma 2 9B model does not work correctly and the conversation template applied to Gemma 2 model continue to generate response until model_worker is killed by CTRL+C.

Probably an error in https://github.com/lm-sys/FastChat/blob/92a6d1fcd69a88ea169c0b01065ce44f1e690a2c/fastchat/conversation.py#L48

Following are the details:

  1. Start controller
python -m fastchat.serve.controller`
  1. Start model_worker
 python -m fastchat.serve.model_worker --model-path ~/llm_models/gemma/gemma-2-9b-it/ --model-name gemma-2-9b-it --max-gpu-memory 22GB

2024-07-22 04:15:09 | INFO | model_worker | Loading the model ['gemma-2-9b-it'] on worker a7fb425b ...
Loading checkpoint shards:   0%|                                       | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards:  25%|███████▊                       | 1/4 [00:01<00:03,  1.23s/it]
Loading checkpoint shards:  50%|███████████████▌               | 2/4 [00:01<00:01,  1.10it/s]
Loading checkpoint shards:  75%|███████████████████████▎       | 3/4 [00:02<00:00,  1.06it/s]
Loading checkpoint shards: 100%|███████████████████████████████| 4/4 [00:03<00:00,  1.27it/s]
Loading checkpoint shards: 100%|███████████████████████████████| 4/4 [00:03<00:00,  1.16it/s]
2024-07-22 04:15:13 | ERROR | stderr |
2024-07-22 04:15:16 | INFO | model_worker | Register to controller
2024-07-22 04:15:16 | ERROR | stderr | INFO:     Started server process [47589]
2024-07-22 04:15:16 | ERROR | stderr | INFO:     Waiting for application startup.
2024-07-22 04:15:16 | ERROR | stderr | INFO:     Application startup complete.
2024-07-22 04:15:16 | ERROR | stderr | INFO:     Uvicorn running on http://localhost:21002 (Press CTRL+C to quit)2024-07-22 04:46:34 | INFO | model_worker | Send heart beat. Models: ['gemma-2-9b-it']. Semaphore: None. call_ct: 0. worker_id: 0deb2443.
  1. Start openai compatible server
python -m fastchat.serve.openai_api_server --host 0.0.0.0 --port 8080 --api-keys sk-testingfschat
  1. After it query the server for models.
curl http://127.0.0.1:8080/v1/models   -H "Authorization: Bearer sk-testingfschat

It returns

{"object":"list","data":[{"id":"gemma-2-9b-it","object":"model","created":1721623876,"owned_by":"fastchat","root":"gemma-2-9b-it","parent":null,"permission":[{"id":"modelperm-rdtuaWfwAHKMFuUPynj6iK","object":"model_permission","created":1721623876,"allow_create_engine":false,"allow_sampling":true,"allow_logprobs":true,"allow_search_indices":true,"allow_view":true,"allow_fine_tuning":false,"organization":"*","group":null,"is_blocking":false}]}]}⏎

  1. Try to run a streaming response with Hi. System continue to respond with stream

HiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHi

This is a wrong response it should respond with:

Hi there! 👋 What can I do for you today? 😊

This error is only in model_worker, it seems something wrong with the gemma template and how it is applied to Gemma 2. Funny the vllm_worker and sglang_worker works fine with gemma 2 models.

vikrantrathore avatar Jul 22 '24 04:07 vikrantrathore