tensorrtllm_backend Error malloc(): unaligned tcache chunk detected Always Occur after tensorrt server handling a certain amount requests

Error malloc(): unaligned tcache chunk detected Always Occur after tensorrt server handling a certain amount requests

Open wangpeilin opened this issue 1 year ago • 5 comments

System Info

Ubuntu 20.04
NVIDIA A100

Who can help?

@kaiyux

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[X] My own task or dataset (give details below)

Reproduction

docker run -itd --gpus=all --shm-size=1g -p8000:8000 -p8001:8001 -p8002:8002 -v /share/datasets:/share/datasets nvcr.io/nvidia/tritonserver:24.07-trtllm-python-py3
code version is 0.11.0 git clone https://github.com/NVIDIA/TensorRT-LLM.git git clone https://github.com/triton-inference-server/tensorrtllm_backend.git
Perform some serving inference calls by aiohttp

Expected behavior

All request are successfully processed and no error

actual behavior

When the server performs multiple inferences, such as 5000 times, it raise error malloc(): unaligned tcache chunk detected Signal (6) received. 截屏2024-08-27 11 56 31 Both continuous and intermittent (such as one day) inference will cause this error.

When I calls 8000 inferences in one test, it raise error pinned_memory_manager.cc:170] "failed to allocate pinned system memory, falling back to non-pinned system memory Finally I set parameter cuda-memory-pool-byte-size to 512M and pinned-memory-pool-byte-size to 512M and solve this problem, but these two parameters are not exposed in the script scripts/launch_triton_server.py, so I want to ask why this problem occurs and if there is any other way to solve this problem.

When I call the server with high concurrency it raise error malloc_consolidate(): unaligned fastbin chunk detected Signal (6) received.

Hope you can help me solve these problems, thanks very much!

additional notes

I think this seems to be because the server does not completely clean up the memory after each inference is completed.

Aug 28 '24 04:08 wangpeilin

tensorrtllm_backend tensorrtllm_backend copied to clipboard

Error malloc(): unaligned tcache chunk detected Always Occur after tensorrt server handling a certain amount requests

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

tensorrtllm_backend
tensorrtllm_backend copied to clipboard