lmdeploy [Bug] Llama 3.1 Support

Checklist

[x] 1. I have searched related issues but cannot get the expected help.
[ ] 2. The bug has not been fixed in the latest version.
[ ] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

Running into errors when running latest llama3.1 awq model with latest docker image. I believe there may need to be support added for this model?

Reproduction

docker run --runtime nvidia --gpus '"device=2"' -v ~/.cache/huggingface:/root/.cache/huggingface --env "HUGGING_FACE_HUB_TOKEN=TOKEN" -p 23333:23333 --ipc=host openmmlab/lmdeploy:latest lmdeploy serve api_server hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4 --backend turbomind --model-format awq

Environment

Latest docker cloned

Error traceback

No response

Jul 24 '24 00:07 vladrad

What is the error? I did not encounter any error when quantizing the llama3.1-8b-instruct model.

Jul 24 '24 02:07 AllentDan

What is the error? I did not encounter any error when quantizing the llama3.1-8b-instruct model.

请问 70B 和 405B 也是支持的吗 @AllentDan

Jul 24 '24 03:07 medwang1

What is the error? I did not encounter any error when quantizing the llama3.1-8b-instruct model.

server starts successfully, but got this error during conversation:

ERROR: Exception in ASGI application Traceback (most recent call last): File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/starlette/responses.py", line 265, in call await wrap(partial(self.listen_for_disconnect, receive)) File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/starlette/responses.py", line 261, in wrap await func() File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/starlette/responses.py", line 238, in listen_for_disconnect message = await receive() File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 553, in receive await self.message_event.wait() File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/asyncio/locks.py", line 213, in wait await fut asyncio.exceptions.CancelledError: Cancelled by cancel scope 7985a007bf10

During handling of the above exception, another exception occurred:

Exception Group Traceback (most recent call last): | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 399, in run_asgi | result = await app( # type: ignore[func-returns-value] | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 70, in call | return await self.app(scope, receive, send) | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in call | await super().call(scope, receive, send) | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/starlette/applications.py", line 123, in call | await self.middleware_stack(scope, receive, send) | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in call | raise exc | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in call | await self.app(scope, receive, _send) | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/starlette/middleware/cors.py", line 85, in call | await self.app(scope, receive, send) | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 65, in call | await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app | raise exc | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app | await app(scope, receive, sender) | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/starlette/routing.py", line 756, in call | await self.middleware_stack(scope, receive, send) | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/starlette/routing.py", line 776, in app | await route.handle(scope, receive, send) | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/starlette/routing.py", line 297, in handle | await self.app(scope, receive, send) | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/starlette/routing.py", line 77, in app | await wrap_app_handling_exceptions(app, request)(scope, receive, send) | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app | raise exc | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app | await app(scope, receive, sender) | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/starlette/routing.py", line 75, in app | await response(scope, receive, send) | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/starlette/responses.py", line 258, in call | async with anyio.create_task_group() as task_group: | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 680, in aexit | raise BaseExceptionGroup( | exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception) +-+---------------- 1 ---------------- | Traceback (most recent call last): | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/starlette/responses.py", line 261, in wrap | await func() | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/starlette/responses.py", line 250, in stream_response | async for chunk in self.body_iterator: | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/serve/openai/api_server.py", line 504, in completion_stream_generator | async for res in result_generator: | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/serve/async_engine.py", line 571, in generate | prompt_input = await self._get_prompt_input(prompt, | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/serve/async_engine.py", line 524, in _get_prompt_input | input_ids = self.tokenizer.encode(prompt, add_bos=sequence_start) | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/tokenizer.py", line 600, in encode | return self.model.encode(s, add_bos, add_special_tokens, **kwargs) | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/tokenizer.py", line 366, in encode | encoded = self.model.encode(s, | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2715, in encode | encoded_inputs = self.encode_plus( | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3127, in encode_plus | return self._encode_plus( | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 601, in _encode_plus | batched_output = self._batch_encode_plus( | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 528, in _batch_encode_plus | encodings = self._tokenizer.encode_batch( | TypeError: TextEncodeInput must be Union[TextInputSequence, Tuple[InputSequence, InputSequence]] +------------------------------------

Jul 24 '24 03:07 Yoosu-L

sorry all... could have been better written but @Yoosu-L is right. only happens when you talk to it and it its up and running. seems like the template is slightly different.

Jul 24 '24 04:07 vladrad

We are working on the llama3 rope. Stay tuned.

Jul 24 '24 05:07 lvhan028

https://github.com/InternLM/lmdeploy/pull/2122 works for llama3.1

Jul 24 '24 06:07 lvhan028

@lvhan028 The issue is still present. I can use completions endpoint fine but chat_completions is not working. TypeError: TextEncodeInput must be Union[TextInputSequence, Tuple[InputSequence, InputSequence]]

I have pulled the latest commit and built the docker image locally. I am using hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4. Could you try this on your end and see if chat_completions is working? And if not, a way to make it work?

Jul 24 '24 18:07 Ichigo3766

I was able to create a chat template myself and got it working.

Jul 24 '24 19:07 Ichigo3766

I was able to create a chat template myself and got it working.

Can you maybe share it ?

Jul 24 '24 19:07 feuler

I was able to create a chat template myself and got it working.

Can you maybe share it ?

Maybe this is helpful. https://ollama.com/library/llama3.1/blobs/8cf247399e57

Jul 25 '24 02:07 thiner

Oh i thought i did share it. Will do it soon.

Jul 25 '24 03:07 Ichigo3766

Create a json file and paste this in there and then when loading the model, use --chat-template and provide the path of the file.

{
    "model_name": "int",
    "system": "<|start_header_id|>system<|end_header_id|>\n\n",
    "meta_instruction": "A chat between a user and an assistant.",
    "eosys": "<|eot_id|>",
    "user": "<|start_header_id|>user<|end_header_id|>\n\n",
    "eoh": "<|eot_id|>",
    "assistant": "<|start_header_id|>assistant<|end_header_id|>\n\n",
    "eoa": "<|eot_id|>",
    "separator": "\n\n",
    "capability": "chat",
    "stop_words": ["<|eot_id|>"]
}

Jul 25 '24 04:07 Ichigo3766

Hi, @Ichigo3766. The chat template is being supported in PR #2123. We are going to support llama3.1 tool calling! Stay tuned

Jul 25 '24 05:07 lvhan028

What is the error? I did not encounter any error when quantizing the llama3.1-8b-instruct model.

server starts successfully, but got this error during conversation:

ERROR: Exception in ASGI application Traceback (most recent call last): File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/starlette/responses.py", line 265, in call await wrap(partial(self.listen_for_disconnect, receive)) File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/starlette/responses.py", line 261, in wrap await func() File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/starlette/responses.py", line 238, in listen_for_disconnect message = await receive() File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 553, in receive await self.message_event.wait() File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/asyncio/locks.py", line 213, in wait await fut asyncio.exceptions.CancelledError: Cancelled by cancel scope 7985a007bf10

During handling of the above exception, another exception occurred:

Exception Group Traceback (most recent call last): | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 399, in run_asgi | result = await app( # type: ignore[func-returns-value] | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 70, in call | return await self.app(scope, receive, send) | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in call | await super().call(scope, receive, send) | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/starlette/applications.py", line 123, in call | await self.middleware_stack(scope, receive, send) | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in call | raise exc | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in call | await self.app(scope, receive, _send) | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/starlette/middleware/cors.py", line 85, in call | await self.app(scope, receive, send) | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 65, in call | await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app | raise exc | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app | await app(scope, receive, sender) | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/starlette/routing.py", line 756, in call | await self.middleware_stack(scope, receive, send) | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/starlette/routing.py", line 776, in app | await route.handle(scope, receive, send) | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/starlette/routing.py", line 297, in handle | await self.app(scope, receive, send) | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/starlette/routing.py", line 77, in app | await wrap_app_handling_exceptions(app, request)(scope, receive, send) | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app | raise exc | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app | await app(scope, receive, sender) | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/starlette/routing.py", line 75, in app | await response(scope, receive, send) | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/starlette/responses.py", line 258, in call | async with anyio.create_task_group() as task_group: | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 680, in aexit | raise BaseExceptionGroup( | exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception) +-+---------------- 1 ---------------- | Traceback (most recent call last): | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/starlette/responses.py", line 261, in wrap | await func() | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/starlette/responses.py", line 250, in stream_response | async for chunk in self.body_iterator: | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/serve/openai/api_server.py", line 504, in completion_stream_generator | async for res in result_generator: | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/serve/async_engine.py", line 571, in generate | prompt_input = await self._get_prompt_input(prompt, | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/serve/async_engine.py", line 524, in _get_prompt_input | input_ids = self.tokenizer.encode(prompt, add_bos=sequence_start) | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/tokenizer.py", line 600, in encode | return self.model.encode(s, add_bos, add_special_tokens, **kwargs) | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/lmdeploy/tokenizer.py", line 366, in encode | encoded = self.model.encode(s, | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2715, in encode | encoded_inputs = self.encode_plus( | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3127, in encode_plus | return self._encode_plus( | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 601, in _encode_plus | batched_output = self._batch_encode_plus( | File "/home/user777/miniconda3/envs/lmdeploy/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 528, in _batch_encode_plus | encodings = self._tokenizer.encode_batch( | TypeError: TextEncodeInput must be Union[TextInputSequence, Tuple[InputSequence, InputSequence]] +------------------------------------

My error as follow:

(lmdeploy) root@intern-studio-073772:~# lmdeploy serve gradio /root/share/new_models/meta-llama/Meta-Llama-3
Meta-Llama-3-8B/              Meta-Llama-3-8B-Instruct/     Meta-Llama-3.1-405B-Instruct/ Meta-Llama-3___1-8B-Instruct/
(lmdeploy) root@intern-studio-073772:~# lmdeploy serve gradio /root/share/new_models/meta-llama/Meta-Llama-3___1-8B-Instruct/
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
You are using a model of type llama to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
2024-07-26 00:42:08,394 - lmdeploy - WARNING - AutoConfig.from_pretrained failed for /root/share/new_models/meta-llama/Meta-Llama-3___1-8B-Instruct/. Exception: `rope_scaling` must be a dictionary with two fields, `type` and `factor`, got {'factor': 8.0, 'low_freq_factor': 1.0, 'high_freq_factor': 4.0, 'original_max_position_embeddings': 8192, 'rope_type': 'llama3'}




2024-07-26 00:42:47,944 - lmdeploy - INFO - input backend=turbomind, backend_config=TurbomindEngineConfig(model_name=None, model_format=None, tp=1, session_len=None, max_batch_size=128, cache_max_entry_count=0.8, cache_block_seq_len=64, enable_prefix_caching=False, quant_policy=0, rope_scaling_factor=0.0, use_logn_attn=False, download_dir=None, revision=None, max_prefill_token_num=8192, num_tokens_per_iter=0, max_prefill_iters=1)
2024-07-26 00:42:47,944 - lmdeploy - INFO - input chat_template_config=ChatTemplateConfig(model_name=None, system=None, meta_instruction=None, eosys=None, user=None, eoh=None, assistant=None, eoa=None, separator=None, capability='chat', stop_words=None)
2024-07-26 00:42:48,145 - lmdeploy - INFO - updated chat_template_onfig=ChatTemplateConfig(model_name='base', system=None, meta_instruction=None, eosys=None, user=None, eoh=None, assistant=None, eoa=None, separator=None, capability='chat', stop_words=None)
2024-07-26 00:42:48,145 - lmdeploy - INFO - model_source: hf_model
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
You are using a model of type llama to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
2024-07-26 00:42:48,169 - lmdeploy - WARNING - The current version of `transformers` is transformers==4.41.1, which is lower than the required version transformers==4.42.3. Please upgrade to the required version.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
You are using a model of type llama to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
You are using a model of type llama to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
You are using a model of type llama to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
You are using a model of type llama to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
You are using a model of type llama to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
2024-07-26 00:42:48,929 - lmdeploy - INFO - model_config:

[llama]
model_name = base
model_arch = LlamaForCausalLM
tensor_para_size = 1
head_num = 32
kv_head_num = 8
vocab_size = 128256
num_layer = 32
inter_size = 14336
norm_eps = 1e-05
attn_bias = 0
start_id = 128000
end_id = 128009
session_len = 131080
weight_type = bf16
rotary_embedding = 128
rope_theta = 500000.0
size_per_head = 128
group_size = 0
max_batch_size = 128
max_context_token_num = 1
step_length = 1
cache_max_entry_count = 0.8
cache_block_seq_len = 64
cache_chunk_size = -1
enable_prefix_caching = False
num_tokens_per_iter = 8192
max_prefill_iters = 17
extra_tokens_per_iter = 0
use_context_fmha = 1
quant_policy = 0
max_position_embeddings = 131072
rope_scaling_factor = 8.0
use_dynamic_ntk = 0
use_logn_attn = 0
lora_policy = 
lora_r = 0
lora_scale = 0.0
lora_max_wo_r = 0
lora_rank_pattern = 
lora_scale_pattern = 


[TM][WARNING] [LlamaTritonModel] `max_context_token_num` = 131080.
2024-07-26 00:42:49,780 - lmdeploy - WARNING - get 227 model params
2024-07-26 00:44:04,509 - lmdeploy - INFO - updated backend_config=TurbomindEngineConfig(model_name=None, model_format=None, tp=1, session_len=None, max_batch_size=128, cache_max_entry_count=0.8, cache_block_seq_len=64, enable_prefix_caching=False, quant_policy=0, rope_scaling_factor=0.0, use_logn_attn=False, download_dir=None, revision=None, max_prefill_token_num=8192, num_tokens_per_iter=0, max_prefill_iters=1)
[WARNING] gemm_config.in is not found; using default GEMM algo
[TM][INFO] NCCL group_id = 0
[TM][INFO] [BlockManager] block_size = 8 MB
[TM][INFO] [BlockManager] max_block_count = 1571
[TM][INFO] [BlockManager] chunk_size = 1571
[TM][WARNING] No enough blocks for `session_len` (131080), `session_len` truncated to 100544.
[TM][INFO] LlamaBatch<T>::Start()
server is gonna mount on: http://0.0.0.0:6006
IMPORTANT: You are using gradio version 4.16.0, however version 4.29.0 is available, please upgrade.
--------
Running on local URL:  http://0.0.0.0:6006

To create a public link, set `share=True` in `launch()`.
2024-07-26 00:46:32,536 - lmdeploy - INFO - prompt='你是谁？', gen_config=EngineGenerationConfig(n=1, max_new_tokens=512, top_p=0.8, top_k=40, temperature=0.7, repetition_penalty=1.0, ignore_eos=False, random_seed=1830898949034541761, stop_words=None, bad_words=None, min_new_tokens=None, skip_special_tokens=True, logprobs=None), prompt_token_id=[57668, 21043, 112471, 11571], adapter_name=None.
2024-07-26 00:46:32,536 - lmdeploy - INFO - session_id=1, history_tokens=0, input_tokens=4, max_new_tokens=512, seq_start=True, seq_end=False, step=0, prep=True
2024-07-26 00:46:32,537 - lmdeploy - INFO - Register stream callback for 1
[TM][INFO] [forward] Enqueue requests
[TM][INFO] [forward] Wait for requests to complete ...
[TM][INFO] [ProcessInferRequests] Request for 1 received.
[TM][INFO] [Forward] [0, 1), dc_bsz = 0, pf_bsz = 1, n_tok = 4, max_q = 4, max_k = 4
[TM][INFO] ------------------------- step = 10 -------------------------
[TM][INFO] ------------------------- step = 20 -------------------------
[TM][INFO] ------------------------- step = 30 -------------------------
[TM][INFO] ------------------------- step = 40 -------------------------
[TM][INFO] ------------------------- step = 50 -------------------------
[TM][INFO] ------------------------- step = 60 -------------------------
[TM][INFO] ------------------------- step = 70 -------------------------
[TM][INFO] ------------------------- step = 80 -------------------------
[TM][INFO] ------------------------- step = 90 -------------------------
[TM][INFO] ------------------------- step = 100 -------------------------
[TM][INFO] ------------------------- step = 110 -------------------------
[TM][INFO] ------------------------- step = 120 -------------------------
[TM][INFO] ------------------------- step = 130 -------------------------
[TM][INFO] ------------------------- step = 140 -------------------------
[TM][INFO] ------------------------- step = 150 -------------------------
[TM][INFO] ------------------------- step = 160 -------------------------
[TM][INFO] ------------------------- step = 170 -------------------------
[TM][INFO] ------------------------- step = 180 -------------------------
[TM][INFO] ------------------------- step = 190 -------------------------
[TM][INFO] ------------------------- step = 200 -------------------------
[TM][INFO] ------------------------- step = 210 -------------------------
[TM][INFO] ------------------------- step = 220 -------------------------
[TM][INFO] ------------------------- step = 230 -------------------------
[TM][INFO] ------------------------- step = 240 -------------------------
[TM][INFO] ------------------------- step = 250 -------------------------
[TM][INFO] ------------------------- step = 260 -------------------------
[TM][INFO] ------------------------- step = 270 -------------------------
[TM][INFO] ------------------------- step = 280 -------------------------
[TM][INFO] ------------------------- step = 290 -------------------------
[TM][INFO] ------------------------- step = 300 -------------------------
[TM][INFO] ------------------------- step = 310 -------------------------
[TM][INFO] ------------------------- step = 320 -------------------------
[TM][INFO] ------------------------- step = 330 -------------------------
[TM][INFO] ------------------------- step = 340 -------------------------
[TM][INFO] ------------------------- step = 350 -------------------------
[TM][INFO] ------------------------- step = 360 -------------------------
[TM][INFO] ------------------------- step = 370 -------------------------
[TM][INFO] ------------------------- step = 380 -------------------------
[TM][INFO] ------------------------- step = 390 -------------------------
[TM][INFO] ------------------------- step = 400 -------------------------
[TM][INFO] ------------------------- step = 410 -------------------------
[TM][INFO] ------------------------- step = 420 -------------------------
[TM][INFO] ------------------------- step = 430 -------------------------
[TM][INFO] ------------------------- step = 440 -------------------------
[TM][INFO] ------------------------- step = 450 -------------------------
[TM][INFO] ------------------------- step = 460 -------------------------
[TM][INFO] ------------------------- step = 470 -------------------------
[TM][INFO] ------------------------- step = 480 -------------------------
[TM][INFO] ------------------------- step = 490 -------------------------
[TM][INFO] ------------------------- step = 500 -------------------------
[TM][INFO] ------------------------- step = 510 -------------------------
[TM][INFO] [Interrupt] slot = 0, id = 1
[TM][INFO] [forward] Request completed for 1
2024-07-26 00:46:40,480 - lmdeploy - INFO - UN-register stream callback for 1
2024-07-26 00:46:42,003 - lmdeploy - INFO - prompt='什么情况', gen_config=EngineGenerationConfig(n=1, max_new_tokens=512, top_p=0.8, top_k=40, temperature=0.7, repetition_penalty=1.0, ignore_eos=False, random_seed=None, stop_words=None, bad_words=None, min_new_tokens=None, skip_special_tokens=True, logprobs=None), prompt_token_id=[101879, 106041], adapter_name=None.
2024-07-26 00:46:42,003 - lmdeploy - INFO - session_id=1, history_tokens=517, input_tokens=2, max_new_tokens=512, seq_start=False, seq_end=False, step=0, prep=True
2024-07-26 00:46:42,003 - lmdeploy - INFO - Register stream callback for 1
[TM][INFO] [forward] Enqueue requests
[TM][INFO] [forward] Wait for requests to complete ...
[TM][INFO] [ProcessInferRequests] Request for 1 received.
[TM][INFO] [Forward] [0, 1), dc_bsz = 0, pf_bsz = 1, n_tok = 3, max_q = 3, max_k = 519
[TM][INFO] ------------------------- step = 520 -------------------------
[TM][INFO] ------------------------- step = 530 -------------------------
[TM][INFO] ------------------------- step = 540 -------------------------
[TM][INFO] ------------------------- step = 550 -------------------------
[TM][INFO] ------------------------- step = 560 -------------------------
[TM][INFO] ------------------------- step = 570 -------------------------
[TM][INFO] ------------------------- step = 580 -------------------------
[TM][INFO] ------------------------- step = 590 -------------------------
[TM][INFO] ------------------------- step = 600 -------------------------
[TM][INFO] ------------------------- step = 610 -------------------------
[TM][INFO] ------------------------- step = 620 -------------------------
[TM][INFO] ------------------------- step = 630 -------------------------
[TM][INFO] ------------------------- step = 640 -------------------------
[TM][INFO] ------------------------- step = 650 -------------------------
[TM][INFO] ------------------------- step = 660 -------------------------
[TM][INFO] ------------------------- step = 670 -------------------------
[TM][INFO] ------------------------- step = 680 -------------------------
[TM][INFO] ------------------------- step = 690 -------------------------
[TM][INFO] ------------------------- step = 700 -------------------------
[TM][INFO] ------------------------- step = 710 -------------------------
[TM][INFO] ------------------------- step = 720 -------------------------
[TM][INFO] ------------------------- step = 730 -------------------------
[TM][INFO] ------------------------- step = 740 -------------------------
[TM][INFO] ------------------------- step = 750 -------------------------
[TM][INFO] ------------------------- step = 760 -------------------------
[TM][INFO] ------------------------- step = 770 -------------------------
[TM][INFO] ------------------------- step = 780 -------------------------
[TM][INFO] ------------------------- step = 790 -------------------------
[TM][INFO] ------------------------- step = 800 -------------------------
[TM][INFO] ------------------------- step = 810 -------------------------
[TM][INFO] ------------------------- step = 820 -------------------------
[TM][INFO] ------------------------- step = 830 -------------------------
[TM][INFO] ------------------------- step = 840 -------------------------
[TM][INFO] ------------------------- step = 850 -------------------------
[TM][INFO] ------------------------- step = 860 -------------------------
[TM][INFO] ------------------------- step = 870 -------------------------
[TM][INFO] ------------------------- step = 880 -------------------------
[TM][INFO] ------------------------- step = 890 -------------------------
[TM][INFO] ------------------------- step = 900 -------------------------
[TM][INFO] ------------------------- step = 910 -------------------------
[TM][INFO] ------------------------- step = 920 -------------------------
[TM][INFO] ------------------------- step = 930 -------------------------
[TM][INFO] ------------------------- step = 940 -------------------------
[TM][INFO] ------------------------- step = 950 -------------------------
[TM][INFO] ------------------------- step = 960 -------------------------
[TM][INFO] ------------------------- step = 970 -------------------------
[TM][INFO] ------------------------- step = 980 -------------------------
[TM][INFO] ------------------------- step = 990 -------------------------
[TM][INFO] ------------------------- step = 1000 -------------------------
[TM][INFO] ------------------------- step = 1010 -------------------------
[TM][INFO] ------------------------- step = 1020 -------------------------
[TM][INFO] ------------------------- step = 1030 -------------------------
[TM][INFO] [Interrupt] slot = 0, id = 1
[TM][INFO] [forward] Request completed for 1
2024-07-26 00:46:48,420 - lmdeploy - INFO - UN-register stream callback for 1



2024-07-26 00:47:04,889 - lmdeploy - INFO - prompt='挂了', gen_config=EngineGenerationConfig(n=1, max_new_tokens=512, top_p=0.8, top_k=40, temperature=0.7, repetition_penalty=1.0, ignore_eos=False, random_seed=None, stop_words=None, bad_words=None, min_new_tokens=None, skip_special_tokens=True, logprobs=None), prompt_token_id=[116796, 35287], adapter_name=None.
2024-07-26 00:47:04,889 - lmdeploy - INFO - session_id=1, history_tokens=1032, input_tokens=2, max_new_tokens=512, seq_start=False, seq_end=False, step=0, prep=True
2024-07-26 00:47:04,889 - lmdeploy - INFO - Register stream callback for 1
[TM][INFO] [forward] Enqueue requests
[TM][INFO] [forward] Wait for requests to complete ...
[TM][INFO] [ProcessInferRequests] Request for 1 received.
[TM][INFO] [Forward] [0, 1), dc_bsz = 0, pf_bsz = 1, n_tok = 3, max_q = 3, max_k = 1034
[TM][INFO] ------------------------- step = 1040 -------------------------
[TM][INFO] ------------------------- step = 1050 -------------------------
[TM][INFO] ------------------------- step = 1060 -------------------------
[TM][INFO] ------------------------- step = 1070 -------------------------
[TM][INFO] ------------------------- step = 1080 -------------------------
[TM][INFO] ------------------------- step = 1090 -------------------------
[TM][INFO] ------------------------- step = 1100 -------------------------
[TM][INFO] ------------------------- step = 1110 -------------------------
[TM][INFO] ------------------------- step = 1120 -------------------------
[TM][INFO] ------------------------- step = 1130 -------------------------
[TM][INFO] ------------------------- step = 1140 -------------------------
[TM][INFO] ------------------------- step = 1150 -------------------------
[TM][INFO] ------------------------- step = 1160 -------------------------
[TM][INFO] ------------------------- step = 1170 -------------------------
[TM][INFO] ------------------------- step = 1180 -------------------------
[TM][INFO] ------------------------- step = 1190 -------------------------
[TM][INFO] ------------------------- step = 1200 -------------------------
[TM][INFO] ------------------------- step = 1210 -------------------------
[TM][INFO] ------------------------- step = 1220 -------------------------
[TM][INFO] ------------------------- step = 1230 -------------------------
[TM][INFO] ------------------------- step = 1240 -------------------------
[TM][INFO] ------------------------- step = 1250 -------------------------
[TM][INFO] ------------------------- step = 1260 -------------------------
[TM][INFO] ------------------------- step = 1270 -------------------------
[TM][INFO] ------------------------- step = 1280 -------------------------
[TM][INFO] ------------------------- step = 1290 -------------------------
[TM][INFO] ------------------------- step = 1300 -------------------------
[TM][INFO] ------------------------- step = 1310 -------------------------
[TM][INFO] ------------------------- step = 1320 -------------------------
[TM][INFO] ------------------------- step = 1330 -------------------------
[TM][INFO] ------------------------- step = 1340 -------------------------
[TM][INFO] ------------------------- step = 1350 -------------------------
[TM][INFO] ------------------------- step = 1360 -------------------------
[TM][INFO] ------------------------- step = 1370 -------------------------
[TM][INFO] ------------------------- step = 1380 -------------------------
[TM][INFO] ------------------------- step = 1390 -------------------------
[TM][INFO] ------------------------- step = 1400 -------------------------
[TM][INFO] ------------------------- step = 1410 -------------------------
[TM][INFO] ------------------------- step = 1420 -------------------------
[TM][INFO] ------------------------- step = 1430 -------------------------
[TM][INFO] ------------------------- step = 1440 -------------------------
[TM][INFO] ------------------------- step = 1450 -------------------------
[TM][INFO] ------------------------- step = 1460 -------------------------
[TM][INFO] ------------------------- step = 1470 -------------------------
[TM][INFO] ------------------------- step = 1480 -------------------------
[TM][INFO] ------------------------- step = 1490 -------------------------
[TM][INFO] ------------------------- step = 1500 -------------------------
[TM][INFO] ------------------------- step = 1510 -------------------------
[TM][INFO] ------------------------- step = 1520 -------------------------
[TM][INFO] ------------------------- step = 1530 -------------------------
[TM][INFO] ------------------------- step = 1540 -------------------------
[TM][INFO] [Interrupt] slot = 0, id = 1
[TM][INFO] [forward] Request completed for 1
2024-07-26 00:47:11,354 - lmdeploy - INFO - UN-register stream callback for 1

Jul 25 '24 16:07 zhangjinnan

@lvhan028 hey! I did see that pr and it is merge. I locally pulled the changes and built the docker image but it still gave me that error. Looks like there is smth missing in the pr? Or maybe smth with the awq quant provided? So yea just creating my own fixed the issue

Jul 25 '24 17:07 Ichigo3766

what's smth? It works at our side. The model evaluation result by opencompass with lmdeploy as an accelerator was passed. Can you paste the error information here?

Jul 26 '24 03:07 lvhan028

I have similar error message after deploying the lastest lmdeploy release.

Jul 29 '24 04:07 aisensiy

Can anyone provide a demo to reproduce the error?

Sep 16 '24 10:09 lvhan028

This issue is marked as stale because it has been marked as invalid or awaiting response for 7 days without any further response. It will be closed in 5 days if the stale label is not removed or if there is no further response.

Sep 24 '24 02:09 github-actions[bot]

Can anyone provide a demo to reproduce the error?

you can reproduce the error using llama3.1 70B on Huawei Ascend 910B

Sep 25 '24 12:09 nullxjx

Can anyone provide a demo to reproduce the error?

you can reproduce the error using llama3.1 70B on Huawei Ascend 910B

pls open another issue for llama3.1 70b on huawei

Sep 25 '24 13:09 lvhan028