tensorrtllm_backend obj_size <= remaining_buffer

obj_size <= remaining_buffer_size

Open qzq-123 opened this issue 9 months ago • 1 comments

Does anyone know what obj_size and remaining_buffer_size refer to? Where can I adjust them?

Container startup parameters docker run --rm -it --net host --shm-size=20g \ --ulimit memlock=-1 --ulimit stack=67108864 --gpus all

I have an A5000 gpu, running the Qwen2.5-3B-Instruct model, python3 ../run.py --input_text "Hello, what is your name?" --max_output_len=50 --tokenizer_dir ./tmp/Qwen/3B/ --engine_dir=./tmp/Qwen/3B/trt_engines/int4_weight_only/1-gpu/ I can get normal results.

But starting the backend service reported an error python3 /tensorrtllm_backend/scripts/launch_triton_server.py --world_size=1 --model_repo=${MODEL_FOLDER}

Jan 20 '25 08:01 qzq-123

tensorrtllm_backend tensorrtllm_backend copied to clipboard

obj_size <= remaining_buffer_size

tensorrtllm_backend
tensorrtllm_backend copied to clipboard