grps_trtllm icon indicating copy to clipboard operation
grps_trtllm copied to clipboard

结束释放有点问题想请教一下!

Open m-wei opened this issue 1 year ago • 4 comments

rt,我参考你的代码,整体代码逻辑我都没咋变,我只是把部分推理代码抽出来,做成加载模型-推理-关闭模型的形式,然后我遇到一个问题,就是我在结束析构时会报下面错误,我debug了一下,单纯的加载-关闭模型是不报这个错误,如果process阶段函数EnqueueAndWait加上auto trtllm_request_id = executor_->enqueueRequest(executor_request);就会出现这个错误,请问这个是要加上什么释放函数吗?

[TensorRT-LLM][ERROR] tensorrt_llm::common::TllmException: [TensorRT-LLM][ERROR] CUDA runtime error in ::cudaFreeAsync(ptr, mCudaStream->get()): context is destroyed (/workspace/tensorrt_llm/cpp/tensorrt_llm/runtime/tllmBuffers.h:122) 1 0x7ff3c266a7aa ./libs/grps_trtllm/lib/libtensorrt_llm.so(+0x5ca7aa) [0x7ff3c266a7aa] 2 0x7ff3c3c6e28a virtual thunk to tensorrt_llm::runtime::GenericTensor<tensorrt_llm::runtime::CudaAllocatorAsync>::~GenericTensor() + 154 3 0x7ff437b4cdca std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 58 4 0x7ff3c41003d3 std::_Sp_counted_ptr_inplace<tensorrt_llm::batch_manager::LlmRequest, std::allocator<tensorrt_llm::batch_manager::LlmRequest>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() + 563 5 0x7ff3c414dc0f tensorrt_llm::batch_manager::TrtGptModelInflightBatching::~TrtGptModelInflightBatching() + 1135 6 0x7ff3c416e3e4 tensorrt_llm::executor::Executor::Impl::~Impl() + 2532 7 0x7ff3c416d495 tensorrt_llm::executor::Executor::~Executor() + 21 8 0x7ff437b49336 Mllm::TrtLlmModelInstance::~TrtLlmModelInstance() + 358 9 0x7ff437b464da Mllm::TrtllmInferer::~TrtllmInferer() + 442 10 0x7ff437b465dd Mllm::TrtllmInferer::~TrtllmInferer() + 13 11 0x7ff437b55532 EventFilterAlg::Close(void*) + 66 12 0x56319373db89 ./InterVL2(+0xab89) [0x56319373db89] 13 0x7ff437772d90 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7ff437772d90] 14 0x7ff437772e40 __libc_start_main + 128 15 0x56319373d2b5 ./InterVL2(+0xa2b5) [0x56319373d2b5]

m-wei avatar Dec 17 '24 01:12 m-wei

我相当于直接调用TrtllmInferer类,然后mian结束回收会报错,在这个之前没有手动析构

m-wei avatar Dec 17 '24 01:12 m-wei

如果一直EnqueueAndWait我也是能正常推理结果的

m-wei avatar Dec 17 '24 01:12 m-wei

然后我所有的http相关的东西都没用到的

m-wei avatar Dec 17 '24 01:12 m-wei

不好意思,因为这个项目涉及不到释放所以我没研究过,不过看日志像是tensorrtllm在释放executor中request时报错,要不去nvidia trtllm项目提个issue ?

zhaocc1106 avatar Dec 17 '24 12:12 zhaocc1106