lmdeploy
lmdeploy copied to clipboard
[Bug] 最新版 lmdeploy 0.5.1 在v100以及rtx 2080 ti 上部署cogvlm2,推理过程会报错: ERROR - Engine loop failed with error: map::at
Checklist
- [X] 1. I have searched related issues but cannot get the expected help.
- [X] 2. The bug has not been fixed in the latest version.
Describe the bug
使用最新版的lmdeploy 0.5.1在多卡V100或者2080ti服务器上部署cogvlm2,可以正常启动,但在推理过程中会出现 ERROR - Engine loop failed with error: map::at 的错误,随后服务终止。试过internvl1.5-awq推理,没有问题。
Reproduction
使用v100部署lmdeploy0.5.1,使用openai方式进行推理就会出现。
Environment
2卡V100,或者4卡魔改版2080ti 22g
Error traceback
启动指令:
export CUDA_VISIBLE_DEVICES=0,1,2,3
lmdeploy serve api_server /root/autodl-tmp/model_zoo/cogvlm2-llama3-chinese-chat-19B/ --backend pytorch --tp 2 --server-name 0.0.0.0 --server-port 8000 --log-level INFO
loginfo:
2024-07-17 12:15:43,792 - lmdeploy - INFO - matching vision model: CogVLMVisionModel
2024-07-17 12:15:49,952 - lmdeploy - INFO - input backend=pytorch, backend_config=PytorchEngineConfig(model_name=None, tp=2, session_len=None, max_batch_size=128, cache_max_entry_count=0.8, eviction_type='recompute', prefill_interval=16, block_size=64, num_cpu_blocks=0, num_gpu_blocks=0, adapters=None, max_prefill_token_num=4096, thread_safe=False, enable_prefix_caching=False, device_type='cuda', download_dir=None, revision=None)
2024-07-17 12:15:49,952 - lmdeploy - INFO - input chat_template_config=None
2024-07-17 12:15:49,960 - lmdeploy - INFO - updated chat_template_onfig=ChatTemplateConfig(model_name='cogvlm2', system=None, meta_instruction=None, eosys=None, user=None, eoh=None, assistant=None, eoa=None, separator=None, capability=None, stop_words=None)
2024-07-17 12:15:49,990 - lmdeploy - INFO - Checking environment for PyTorch Engine.
2024-07-17 12:15:50,698 - lmdeploy - INFO - Checking model.
2024-07-17 12:15:50,699 - lmdeploy - WARNING - LMDeploy requires transformers version: [4.33.0 ~ 4.41.2], but found version: 4.42.3
2024-07-17 12:15:50,709 - lmdeploy - INFO - MASTER_ADDR=127.0.0.1, MASTER_PORT=29500
2024-07-17 12:15:53,705 - lmdeploy - INFO - Patching model.
2024-07-17 12:15:53,933 - lmdeploy - INFO - Loading model weights, please waiting.
2024-07-17 12:16:04,068 - lmdeploy - INFO - build CacheEngine with config:CacheConfig(block_size=64, num_cpu_blocks=1024, num_gpu_blocks=1699, window_size=-1, cache_max_entry_count=0.8, max_prefill_token_num=4096, enable_prefix_caching=False)
2024-07-17 12:16:06,814 - lmdeploy - INFO - updated backend_config=PytorchEngineConfig(model_name=None, tp=2, session_len=None, max_batch_size=128, cache_max_entry_count=0.8, eviction_type='recompute', prefill_interval=16, block_size=64, num_cpu_blocks=0, num_gpu_blocks=0, adapters=None, max_prefill_token_num=4096, thread_safe=False, enable_prefix_caching=False, device_type='cuda', download_dir=None, revision=None)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
HINT: Please open http://0.0.0.0:8000 in a browser for detailed api usage!!!
HINT: Please open http://0.0.0.0:8000 in a browser for detailed api usage!!!
HINT: Please open http://0.0.0.0:8000 in a browser for detailed api usage!!!
INFO: Started server process [1073]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
推理过程loginfo:
INFO: 127.0.0.1:56100 - "GET /v1/models HTTP/1.1" 200 OK
INFO: 127.0.0.1:56112 - "POST /v1/chat/completions HTTP/1.1" 200 OK
2024-07-17 12:16:32,555 - lmdeploy - INFO - start ImageEncoder._forward_loop
2024-07-17 12:16:32,555 - lmdeploy - INFO - ImageEncoder received 1 images, left 1 images.
2024-07-17 12:16:32,555 - lmdeploy - INFO - ImageEncoder process 1 images, left 0 images.
2024-07-17 12:16:36,615 - lmdeploy - INFO - ImageEncoder forward 1 images, cost 4.060s
2024-07-17 12:16:36,616 - lmdeploy - INFO - ImageEncoder done 1 images, left 0 images.
2024-07-17 12:16:36,618 - lmdeploy - INFO - prompt='<IMAGE_TOKEN>Question: (\'你是一个表单审核人员,需要判断图片中表格的类型。请判断图片是哪一种表格:[0."其他类型",1."活动记录表",2."登记表"],如果是"其他类型",输出"type":0;如果是"活动记录表",输出"type":1;如果是"登记表",则输出"type":2。结果只按照如下json格式输出:{ "type":}\',) Answer:', gen_config=EngineGenerationConfig(n=1, max_new_tokens=2048, top_p=0.7, top_k=40, temperature=0.0, repetition_penalty=1.0, ignore_eos=False, random_seed=11971543465637689405, stop_words=[128001], bad_words=None, min_new_tokens=None, skip_special_tokens=True, logprobs=None), prompt_token_id=[128000, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 14924, 25, 4417, 57668, 122503, 21405, 24946, 91951, 108585, 3922, 86206, 122225, 47030, 16325, 21405, 35083, 9554, 33005, 1811, 15225, 122225, 47030, 21043, 106189, 120143, 21405, 35083, 5232, 58, 15, 1210, 93994, 33005, 498, 16, 1210, 108726, 66677, 21405, 498, 17, 1210, 29741, 41914, 21405, 1365, 3922, 63344, 21043, 1, 93994, 33005, 1, 3922, 67117, 45570, 794, 15, 26016, 63344, 21043, 1, 108726, 66677, 21405, 1, 3922, 67117, 45570, 794, 16, 26016, 63344, 21043, 1, 29741, 41914, 21405, 1, 3922, 47548, 67117, 45570, 794, 17, 1811, 60251, 92780, 117026, 121589, 2285, 69905, 67117, 12832, 330, 1337, 794, 17266, 8, 22559, 25], adapter_name=None.
2024-07-17 12:16:36,618 - lmdeploy - INFO - session_id=1, history_tokens=0, input_tokens=2408, max_new_tokens=2048, seq_start=True, seq_end=True, step=0, prep=True
2024-07-17 12:16:36,618 - lmdeploy - WARNING - `temperature` is 0, set to 1e-6
2024-07-17 12:16:38,315 - lmdeploy - ERROR - Engine loop failed with error: map::at
Traceback (most recent call last):
File "/root/miniconda3/envs/lmdeploy/lib/python3.9/site-packages/lmdeploy/pytorch/engine/request.py", line 17, in _raise_exception_on_finish
task.result()
File "/root/miniconda3/envs/lmdeploy/lib/python3.9/site-packages/lmdeploy/pytorch/engine/engine.py", line 857, in async_loop
await self._async_loop()
File "/root/miniconda3/envs/lmdeploy/lib/python3.9/site-packages/lmdeploy/pytorch/engine/engine.py", line 847, in _async_loop
await __step(True)
File "/root/miniconda3/envs/lmdeploy/lib/python3.9/site-packages/lmdeploy/pytorch/engine/engine.py", line 833, in __step
raise e
File "/root/miniconda3/envs/lmdeploy/lib/python3.9/site-packages/lmdeploy/pytorch/engine/engine.py", line 825, in __step
raise out
File "/root/miniconda3/envs/lmdeploy/lib/python3.9/site-packages/lmdeploy/pytorch/engine/engine.py", line 774, in _async_loop_background
await self._async_step_background(
File "/root/miniconda3/envs/lmdeploy/lib/python3.9/site-packages/lmdeploy/pytorch/engine/engine.py", line 683, in _async_step_background
output = await self._async_model_forward(inputs,
File "/root/miniconda3/envs/lmdeploy/lib/python3.9/site-packages/lmdeploy/utils.py", line 253, in __tmp
return (await func(*args, **kwargs))
File "/root/miniconda3/envs/lmdeploy/lib/python3.9/site-packages/lmdeploy/pytorch/engine/engine.py", line 590, in _async_model_forward
return await __forward(inputs)
File "/root/miniconda3/envs/lmdeploy/lib/python3.9/site-packages/lmdeploy/pytorch/engine/engine.py", line 568, in __forward
return await self.model_agent.async_forward(
File "/root/miniconda3/envs/lmdeploy/lib/python3.9/site-packages/lmdeploy/pytorch/engine/model_agent.py", line 1241, in async_forward
output = self._forward_impl(inputs,
File "/root/miniconda3/envs/lmdeploy/lib/python3.9/site-packages/lmdeploy/pytorch/engine/model_agent.py", line 1208, in _forward_impl
output = model_forward(
File "/root/miniconda3/envs/lmdeploy/lib/python3.9/site-packages/lmdeploy/pytorch/engine/model_agent.py", line 497, in model_forward
output = patched_model.patched_forward(
File "/root/miniconda3/envs/lmdeploy/lib/python3.9/site-packages/lmdeploy/pytorch/models/patch.py", line 210, in __call__
output = self._model(*args, **kwargs)
File "/root/miniconda3/envs/lmdeploy/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/lmdeploy/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/modeling_cogvlm.py", line 649, in forward
outputs = self.model(
File "/root/miniconda3/envs/lmdeploy/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/lmdeploy/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/envs/lmdeploy/lib/python3.9/site-packages/lmdeploy/pytorch/models/cogvlm.py", line 268, in forward
layer_outputs = decoder_layer(
File "/root/miniconda3/envs/lmdeploy/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/lmdeploy/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/modeling_cogvlm.py", line 262, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/root/miniconda3/envs/lmdeploy/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/lmdeploy/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1561, in _call_impl
result = forward_call(*args, **kwargs)
File "/root/miniconda3/envs/lmdeploy/lib/python3.9/site-packages/lmdeploy/pytorch/models/cogvlm.py", line 233, in forward
return self._contiguous_batching_forward_impl(
File "/root/miniconda3/envs/lmdeploy/lib/python3.9/site-packages/lmdeploy/pytorch/models/cogvlm.py", line 191, in _contiguous_batching_forward_impl
paged_attention_fwd(
File "<string>", line 3, in paged_attention_fwd
File "/root/miniconda3/envs/lmdeploy/lib/python3.9/site-packages/lmdeploy/pytorch/kernels/dispatcher.py", line 95, in load_and_call
return self.dispatched_func(*args, **kwargs)
File "/root/miniconda3/envs/lmdeploy/lib/python3.9/site-packages/lmdeploy/pytorch/kernels/cuda/pagedattention.py", line 759, in paged_attention_fwd
_fwd_kernel[grid](q,
File "<string>", line 41, in __fwd_kernel_launcher
File "/root/miniconda3/envs/lmdeploy/lib/python3.9/site-packages/triton/runtime/jit.py", line 532, in run
self.cache[device][key] = compile(
File "/root/miniconda3/envs/lmdeploy/lib/python3.9/site-packages/triton/compiler/compiler.py", line 543, in compile
next_module = compile_kernel(module)
File "/root/miniconda3/envs/lmdeploy/lib/python3.9/site-packages/triton/compiler/compiler.py", line 441, in <lambda>
lambda src: ttgir_to_llir(src, extern_libs, target, tma_infos))
File "/root/miniconda3/envs/lmdeploy/lib/python3.9/site-packages/triton/compiler/compiler.py", line 167, in ttgir_to_llir
return translate_triton_gpu_to_llvmir(mod, target.capability, tma_infos, runtime.TARGET.NVVM)
IndexError: map::at
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/root/miniconda3/envs/lmdeploy/lib/python3.9/site-packages/starlette/responses.py", line 265, in __call__
await wrap(partial(self.listen_for_disconnect, receive))
File "/root/miniconda3/envs/lmdeploy/lib/python3.9/site-packages/starlette/responses.py", line 261, in wrap
await func()
File "/root/miniconda3/envs/lmdeploy/lib/python3.9/site-packages/starlette/responses.py", line 238, in listen_for_disconnect
message = await receive()
File "/root/miniconda3/envs/lmdeploy/lib/python3.9/site-packages/uvicorn/protocols/http/httptools_impl.py", line 553, in receive
await self.message_event.wait()
File "/root/miniconda3/envs/lmdeploy/lib/python3.9/asyncio/locks.py", line 226, in wait
await fut
asyncio.exceptions.CancelledError: Cancelled by cancel scope 7f15eddfc130
During handling of the above exception, another exception occurred:
+ Exception Group Traceback (most recent call last):
| File "/root/miniconda3/envs/lmdeploy/lib/python3.9/site-packages/uvicorn/protocols/http/httptools_impl.py", line 399, in run_asgi
| result = await app( # type: ignore[func-returns-value]
| File "/root/miniconda3/envs/lmdeploy/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py", line 70, in __call__
| return await self.app(scope, receive, send)
| File "/root/miniconda3/envs/lmdeploy/lib/python3.9/site-packages/fastapi/applications.py", line 1054, in __call__
| await super().__call__(scope, receive, send)
| File "/root/miniconda3/envs/lmdeploy/lib/python3.9/site-packages/starlette/applications.py", line 123, in __call__
| await self.middleware_stack(scope, receive, send)
| File "/root/miniconda3/envs/lmdeploy/lib/python3.9/site-packages/starlette/middleware/errors.py", line 164, in __call__
| await self.app(scope, receive, _send)
| File "/root/miniconda3/envs/lmdeploy/lib/python3.9/site-packages/starlette/middleware/cors.py", line 85, in __call__
| await self.app(scope, receive, send)
| File "/root/miniconda3/envs/lmdeploy/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 65, in __call__
| await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
| File "/root/miniconda3/envs/lmdeploy/lib/python3.9/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
| await app(scope, receive, sender)
| File "/root/miniconda3/envs/lmdeploy/lib/python3.9/site-packages/starlette/routing.py", line 756, in __call__
| await self.middleware_stack(scope, receive, send)
| File "/root/miniconda3/envs/lmdeploy/lib/python3.9/site-packages/starlette/routing.py", line 776, in app
| await route.handle(scope, receive, send)
| File "/root/miniconda3/envs/lmdeploy/lib/python3.9/site-packages/starlette/routing.py", line 297, in handle
| await self.app(scope, receive, send)
| File "/root/miniconda3/envs/lmdeploy/lib/python3.9/site-packages/starlette/routing.py", line 77, in app
| await wrap_app_handling_exceptions(app, request)(scope, receive, send)
| File "/root/miniconda3/envs/lmdeploy/lib/python3.9/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
| await app(scope, receive, sender)
| File "/root/miniconda3/envs/lmdeploy/lib/python3.9/site-packages/starlette/routing.py", line 75, in app
| await response(scope, receive, send)
| File "/root/miniconda3/envs/lmdeploy/lib/python3.9/site-packages/starlette/responses.py", line 265, in __call__
| await wrap(partial(self.listen_for_disconnect, receive))
| File "/root/miniconda3/envs/lmdeploy/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 680, in __aexit__
| raise BaseExceptionGroup(
| exceptiongroup.BaseExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
+-+---------------- 1 ----------------
| Traceback (most recent call last):
| File "/root/miniconda3/envs/lmdeploy/lib/python3.9/asyncio/queues.py", line 166, in get
| await getter
| asyncio.exceptions.CancelledError
|
| During handling of the above exception, another exception occurred:
|
| Traceback (most recent call last):
| File "/root/miniconda3/envs/lmdeploy/lib/python3.9/asyncio/tasks.py", line 490, in wait_for
| return fut.result()
| asyncio.exceptions.CancelledError
|
| The above exception was the direct cause of the following exception:
|
| Traceback (most recent call last):
| File "/root/miniconda3/envs/lmdeploy/lib/python3.9/site-packages/lmdeploy/pytorch/engine/request.py", line 171, in __no_threadsafe_get
| return await asyncio.wait_for(self.resp_que.get(), timeout)
| File "/root/miniconda3/envs/lmdeploy/lib/python3.9/asyncio/tasks.py", line 492, in wait_for
| raise exceptions.TimeoutError() from exc
| asyncio.exceptions.TimeoutError
|
| During handling of the above exception, another exception occurred:
|
| Traceback (most recent call last):
| File "/root/miniconda3/envs/lmdeploy/lib/python3.9/asyncio/runners.py", line 44, in run
| return loop.run_until_complete(main)
| File "uvloop/loop.pyx", line 1511, in uvloop.loop.Loop.run_until_complete
| File "uvloop/loop.pyx", line 1504, in uvloop.loop.Loop.run_until_complete
| File "uvloop/loop.pyx", line 1377, in uvloop.loop.Loop.run_forever
| File "uvloop/loop.pyx", line 555, in uvloop.loop.Loop._run
| File "uvloop/loop.pyx", line 474, in uvloop.loop.Loop._on_idle
| File "uvloop/cbhandles.pyx", line 83, in uvloop.loop.Handle._run
| File "uvloop/cbhandles.pyx", line 63, in uvloop.loop.Handle._run
| File "/root/miniconda3/envs/lmdeploy/lib/python3.9/site-packages/starlette/responses.py", line 261, in wrap
| await func()
| File "/root/miniconda3/envs/lmdeploy/lib/python3.9/site-packages/starlette/responses.py", line 250, in stream_response
| async for chunk in self.body_iterator:
| File "/root/miniconda3/envs/lmdeploy/lib/python3.9/site-packages/lmdeploy/serve/openai/api_server.py", line 504, in completion_stream_generator
| async for res in result_generator:
| File "/root/miniconda3/envs/lmdeploy/lib/python3.9/site-packages/lmdeploy/serve/async_engine.py", line 615, in generate
| async for outputs in generator.async_stream_infer(
| File "/root/miniconda3/envs/lmdeploy/lib/python3.9/site-packages/lmdeploy/pytorch/engine/engine_instance.py", line 177, in async_stream_infer
| resp = await self.req_sender.async_recv(req_id)
| File "/root/miniconda3/envs/lmdeploy/lib/python3.9/site-packages/lmdeploy/pytorch/engine/request.py", line 314, in async_recv
| resp: Response = await self._async_resp_get()
| File "/root/miniconda3/envs/lmdeploy/lib/python3.9/site-packages/lmdeploy/pytorch/engine/request.py", line 187, in _async_resp_get
| return await __no_threadsafe_get()
| File "/root/miniconda3/envs/lmdeploy/lib/python3.9/site-packages/lmdeploy/pytorch/engine/request.py", line 175, in __no_threadsafe_get
| exit(1)
| File "/root/miniconda3/envs/lmdeploy/lib/python3.9/_sitebuiltins.py", line 26, in __call__
| raise SystemExit(code)
| SystemExit: 1
+------------------------------------
ERROR: Traceback (most recent call last):
File "/root/miniconda3/envs/lmdeploy/lib/python3.9/asyncio/queues.py", line 166, in get
await getter
asyncio.exceptions.CancelledError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/root/miniconda3/envs/lmdeploy/lib/python3.9/asyncio/tasks.py", line 490, in wait_for
return fut.result()
asyncio.exceptions.CancelledError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/root/miniconda3/envs/lmdeploy/lib/python3.9/site-packages/lmdeploy/pytorch/engine/request.py", line 171, in __no_threadsafe_get
return await asyncio.wait_for(self.resp_que.get(), timeout)
File "/root/miniconda3/envs/lmdeploy/lib/python3.9/asyncio/tasks.py", line 492, in wait_for
raise exceptions.TimeoutError() from exc
asyncio.exceptions.TimeoutError
During handling of the above exception, another exception occurred: