lmdeploy
lmdeploy copied to clipboard
双卡V100 使用 lmdeploy cli 部署InternLM2-Chat-20B服务, 运行一段时间后,请求报错: an illegal memory access was encountered /lmdeploy/src/turbomind/utils/allocator.h:231
Checklist
- [x] 1. I have searched related issues but cannot get the expected help.
- [x] 2. The bug has not been fixed in the latest version.
Describe the bug
双卡V100 使用 lmdeploy cli 部署InternLM2-Chat-20B服务, 运行一段时间后,请求报错: an illegal memory access was encountered /lmdeploy/src/turbomind/utils/allocator.h:231 报错信息如下:
INFO: 0.0.0.0:51753 - "POST /v1/chat/completions HTTP/1.1" 200 OK
INFO: 0.0.0.0:51753 - "POST /v1/chat/completions HTTP/1.1" 200 OK
INFO: 0.0.0.0:51753 - "POST /v1/chat/completions HTTP/1.1" 200 OK
2024-03-14 07:05:16,356 - lmdeploy - WARNING - kwargs request_output_len is deprecated for inference, use GenerationConfig instead.
terminate called after throwing an instance of 'std::runtime_error'
what(): [TM][ERROR] CUDA runtime error: an illegal memory access was encountered /lmdeploy/src/turbomind/utils/allocator.h:231
terminate called recursively
Aborted (core dumped)
Reproduction
lmdeploy serve api_server --server-port 8001 internlm2-chat-20b --tp 2
Environment
* python=3.10.2
* lmdeploy=0.2.5
* cuda=12.1
Error traceback
No response
根据目前的信息难以定位出问题的位置,可以设置环境变量export TM_DEBUG_LEVEL=DEBUG再试试
我换成用docker部署后,没有报错了,但是用一段时间后,api接口会返回一堆竖线:,你说的设置环境变量查看日志我设置后在跑了,如果出现报错信息, 我会第一时间反馈
{
"id": "248",
"object": "chat.completion",
"created": 1711095730,
"model": "internlm2-chat-20b",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||"
},
"finish_reason": "length"
}
],
"usage": {
"prompt_tokens": 103,
"total_tokens": 2152,
"completion_tokens": 2049
}
}
上述问题的日志如下:
...
[TM][DEBUG] turbomind::Tensor& turbomind::TensorMap::at(const string&) for key ite
[TM][DEBUG] run syncAndCheck at /lmdeploy/src/turbomind/kernels/stop_criteria_kernels.cu:104
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: sequence_limit_length
[TM][DEBUG] turbomind::Tensor& turbomind::TensorMap::at(const string&) for key sequence_limit_length
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: ite
[TM][DEBUG] T turbomind::Tensor::getVal() const [with T = int] start
[TM][DEBUG] T turbomind::Tensor::getVal(size_t) const [with T = int; size_t = long unsigned int] start
[TM][DEBUG] turbomind::Tensor& turbomind::TensorMap::at(const string&) for key max_input_length
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: max_input_length
[TM][DEBUG] T turbomind::Tensor::getVal() const [with T = int] start
[TM][DEBUG] T turbomind::Tensor::getVal(size_t) const [with T = int; size_t = long unsigned int] start
[TM][DEBUG] turbomind::Tensor& turbomind::TensorMap::at(const string&) for key logits
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: logits
[TM][DEBUG] T* turbomind::Tensor::getPtr() const [with T = float] start
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: stop_words_list
[TM][DEBUG] turbomind::Tensor& turbomind::TensorMap::at(const string&) for key stop_words_list
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: stop_words_list
[TM][DEBUG] turbomind::Tensor& turbomind::TensorMap::at(const string&) for key finished
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: finished
[TM][DEBUG] T* turbomind::Tensor::getPtrWithOffset(size_t) const [with T = bool; size_t = long unsigned int] start
[TM][DEBUG] turbomind::Tensor& turbomind::TensorMap::at(const string&) for key stop_words_list
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: stop_words_list
[TM][DEBUG] T* turbomind::Tensor::getPtrWithOffset(size_t) const [with T = const int; size_t = long unsigned int] start
[TM][DEBUG] turbomind::Tensor& turbomind::TensorMap::at(const string&) for key output_ids
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: output_ids
[TM][DEBUG] T* turbomind::Tensor::getPtr() const [with T = const int] start
[TM][DEBUG] void turbomind::invokeStopWordsCriterion(const int*, const int*, const int*, bool*, size_t, size_t, int, int, int, cudaStream_t) start
[TM][DEBUG] run syncAndCheck at /lmdeploy/src/turbomind/kernels/stop_criteria_kernels.cu:104
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: sequence_limit_length
[TM][DEBUG] turbomind::Tensor& turbomind::TensorMap::at(const string&) for key sequence_limit_length
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: sequence_limit_length
[TM][DEBUG] T* turbomind::Tensor::getPtr() const [with T = const unsigned int] start
[TM][DEBUG] turbomind::Tensor& turbomind::TensorMap::at(const string&) for key should_stop
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: should_stop
[TM][DEBUG] T* turbomind::Tensor::getPtr() const [with T = bool] start
[TM][DEBUG] turbomind::Tensor& turbomind::TensorMap::at(const string&) for key finished
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: finished
[TM][DEBUG] T* turbomind::Tensor::getPtr() const [with T = bool] start
[TM][DEBUG] void turbomind::invokeLengthCriterion(bool*, bool*, int*, const uint32_t*, int, int, int, cudaStream_t) start
[TM][DEBUG] run syncAndCheck at /lmdeploy/src/turbomind/kernels/stop_criteria_kernels.cu:159
[TM][INFO] [Forward] step = 624, [ 127]
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: sequence_limit_length
[TM][DEBUG] T* turbomind::Tensor::getPtr() const [with T = const unsigned int] start
[TM][DEBUG] turbomind::Tensor& turbomind::TensorMap::at(const string&) for key should_stop
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: should_stop
[TM][DEBUG] T* turbomind::Tensor::getPtr() const [with T = bool] start
[TM][DEBUG] turbomind::Tensor& turbomind::TensorMap::at(const string&) for key finished
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: finished
[TM][DEBUG] run syncAndCheck at /lmdeploy/src/turbomind/models/llama/LlamaBatch.cc:1178
[TM][DEBUG] T* turbomind::Tensor::getPtr() const [with T = bool] start
[TM][DEBUG] void turbomind::invokeLengthCriterion(bool*, bool*, int*, const uint32_t*, int, int, int, cudaStream_t) start
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: output_ids
[TM][DEBUG] T* turbomind::Tensor::getPtr() const [with T = int] start
[TM][DEBUG] getPtr with type i4, but data type is: u4
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: sequence_length
[TM][DEBUG] T* turbomind::Tensor::getPtr() const [with T = int] start
[TM][DEBUG] getPtr with type i4, but data type is: u4
[TM][DEBUG] run syncAndCheck at /lmdeploy/src/turbomind/kernels/stop_criteria_kernels.cu:159
[TM][DEBUG] run syncAndCheck at /lmdeploy/src/turbomind/models/llama/LlamaBatch.cc:1178
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: output_ids
[TM][DEBUG] T* turbomind::Tensor::getPtr() const [with T = int] start
[TM][DEBUG] getPtr with type i4, but data type is: u4
[TM][DEBUG] bool turbomind::TensorMap::isExist(const string&) const for key: sequence_length
[TM][DEBUG] T* turbomind::Tensor::getPtr() const [with T = int] start
[TM][DEBUG] getPtr with type i4, but data type is: u4
[TM][INFO] [Finish] slot 0, tokens [ 1 92543 9081 364 60403 68625 90044 78193 75783 80118 70180 72018 329 68884 70180 74595 80118 68553 68693 73885 90044 68306 68347 68914 69928 328 262 68263 68693 60967 71404 69928 68897 328 76229 68319 60403 68630 68534 80118 60377 60967 71404 278 68345 69928 1986 70923 70180 68508 80118 60377 328 71724 87986 642 312 281 262 60403 60900 69616 80118 83043 484 90044 68306 68347 68914 69928 69180 297 262 81990 68260 328 262 68313 70180 60389 69771 69180 68322 80118 364 314 281 262 73589 86910 70699 68878 68274 68855 328 262 69192 334 262 74595 80118 60374 334 461 69358 73228 628 262 74595 80118 60374 334 461 69358 61233 60494 628 262 74595 80118 60374 334 461 60655 86426 68914 628 74595 80118 60374 334 461 68345 69928 830 308 281 262 68614 89041 70180 60389 69358 61233 60494 69928 60366 72702 80118 60353 69192 60387 74595 80118 60374 60387 60419 69358 61233 60494 285 69712 60420 60353 74595 80118 60374 60387 60419 69358 61233 60494 285 70651 60420 60353 74595 80118 60374 60387 60419 69358 61233 60494 285 70293 402 14511 262 90044 68306 68347 68914 69928 69180 642 285 262 69358 73228 334 262 69358 68508 76024 81539 364 285 262 69358 61233 60494 60387 69358 68379 61233 60494 364 285 262 60655 86426 68914 334 262 69358 68508 86426 69323 75543 402 76885 312 60387 364 25747 334 262 69844 461 72863 69358 68300 68306 73515 68349 60527 76024 68740 278 72363 60354 80118 60357 68274 69358 73228 68319 69358 61233 60494 69845 60655 86426 68914 364 69358 61233 60494 285 70293 402 76885 314 60387 364 25747 334 262 69844 461 72863 69358 68300 68306 68379 61233 60494 82089 60504 278 72363 60354 80118 60357 68274 69358 73228 68319 69358 61233 60494 69845 60655 86426 68914 364 69358 61233 60494 402 76885 308 60387 364 25747 334 262 69844 461 72863 69358 68300 68306 75543 278 72363 60354 80118 60357 68274 69358 73228 68319 69358 61233 60494 69845 60655 86426 68914 364 69358 60655 86426 68914 92542 364 92543 1008 364 70180 72363 69060 68290 68306 69712 60430 61846 68740 60504 60354 80118 92542 364 92543 525 11353 364 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127]
[TM][INFO] [Interrupt] slot = 0, id = 50
[TM][INFO] [Interrupt] slot 0, tokens [ 1 92543 9081 364 60403 68625 90044 78193 75783 80118 70180 72018 329 68884 70180 74595 80118 68553 68693 73885 90044 68306 68347 68914 69928 328 262 68263 68693 60967 71404 69928 68897 328 76229 68319 60403 68630 68534 80118 60377 60967 71404 278 68345 69928 1986 70923 70180 68508 80118 60377 328 71724 87986 642 312 281 262 60403 60900 69616 80118 83043 484 90044 68306 68347 68914 69928 69180 297 262 81990 68260 328 262 68313 70180 60389 69771 69180 68322 80118 364 314 281 262 73589 86910 70699 68878 68274 68855 328 262 69192 334 262 74595 80118 60374 334 461 69358 73228 628 262 74595 80118 60374 334 461 69358 61233 60494 628 262 74595 80118 60374 334 461 60655 86426 68914 628 74595 80118 60374 334 461 68345 69928 830 308 281 262 68614 89041 70180 60389 69358 61233 60494 69928 60366 72702 80118 60353 69192 60387 74595 80118 60374 60387 60419 69358 61233 60494 285 69712 60420 60353 74595 80118 60374 60387 60419 69358 61233 60494 285 70651 60420 60353 74595 80118 60374 60387 60419 69358 61233 60494 285 70293 402 14511 262 90044 68306 68347 68914 69928 69180 642 285 262 69358 73228 334 262 69358 68508 76024 81539 364 285 262 69358 61233 60494 60387 69358 68379 61233 60494 364 285 262 60655 86426 68914 334 262 69358 68508 86426 69323 75543 402 76885 312 60387 364 25747 334 262 69844 461 72863 69358 68300 68306 73515 68349 60527 76024 68740 278 72363 60354 80118 60357 68274 69358 73228 68319 69358 61233 60494 69845 60655 86426 68914 364 69358 61233 60494 285 70293 402 76885 314 60387 364 25747 334 262 69844 461 72863 69358 68300 68306 68379 61233 60494 82089 60504 278 72363 60354 80118 60357 68274 69358 73228 68319 69358 61233 60494 69845 60655 86426 68914 364 69358 61233 60494 402 76885 308 60387 364 25747 334 262 69844 461 72863 69358 68300 68306 75543 278 72363 60354 80118 60357 68274 69358 73228 68319 69358 61233 60494 69845 60655 86426 68914 364 69358 60655 86426 68914 92542 364 92543 1008 364 70180 72363 69060 68290 68306 69712 60430 61846 68740 60504 60354 80118 92542 364 92543 525 11353 364 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127 127]
[TM][INFO] [forward] Request complete for 50, code 0
[TM][DEBUG] static std::shared_ptr<std::unordered_map<std::basic_string<char>, triton::Tensor> > LlamaTritonModelInstance<T>::convert_outputs(const std::unordered_map<std::basic_string<char>, turbomind::Tensor>&) [with T = __half]
[TM][DEBUG] static std::shared_ptr<std::unordered_map<std::basic_string<char>, triton::Tensor> > LlamaTritonModelInstance<T>::convert_outputs(const std::unordered_map<std::basic_string<char>, turbomind::Tensor>&) [with T = __half]
"finish_reason": "length" 表示模型一直在生成,停不下来 模型是官方的模型么? 复现的方式能不能提供下呢?比如 prompt数据,脚本之类的
这个问题不是一定会出现的,也不是特定的prompt会导致,任何prompt都可能会导致该问题,所以没有复现的方式。是服务起来后一开始没什么问题,请求几次后就挂了,重启服务又好了。调用方式就是用的openai的接口进行调用,较长的prompt更容易出现问题,有时候是服务直接挂掉,有时候是返回一堆竖线。 这个是一个请求示例(content做了脱敏处理)
curl --location --request POST 'http://127.0.0.1:8001/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data-raw '{
"model": "iternlm2-chat-20b",
"messages": [
{
"role": "user",
"content": "xxxxxxxxxxxxxxx"
}
],
"temperature": 0,
"top_p": 0,
"max_tokens": 2048,
"echo": false,
"stream": false,
"repetition_penalty": 1.0,
"functions": []
}'
模型就是从官方GitHub给出的huggingface 和 modelscope 上下载的。如果怀疑是模型文件问题,我可以重新下载重试一下
能不能麻烦你升级到 v0.3.0 再试试?我们修了一些bug,可能会解决你遇到的问题。 如果还是老样子,我再找下 v100 服务器,尝试复现看看
在你们推出v0.3.0后,我已经第一时间更新试过了,还是会出现这两个问题,甚至感觉比之前更加频繁了。
。
另外,我已经重新下载完模型文件,会重新试一下看是不是文件问题
可以试试export TM_DEBUG_LEVEL=DEBUG。然后条件允许的话用gdb起server,会对定位问题比较有帮助。
Please try the latest version https://github.com/InternLM/lmdeploy/releases/tag/v0.5.1
This issue is marked as stale because it has been marked as invalid or awaiting response for 7 days without any further response. It will be closed in 5 days if the stale label is not removed or if there is no further response.
This issue is closed because it has been stale for 5 days. Please open a new issue if you have similar issues or you have any new updates now.