tensorrtllm_backend
tensorrtllm_backend copied to clipboard
obj_size <= remaining_buffer_size
Does anyone know what obj_size and remaining_buffer_size refer to? Where can I adjust them?
Container startup parameters
docker run --rm -it --net host --shm-size=20g \ --ulimit memlock=-1 --ulimit stack=67108864 --gpus all
I have an A5000 gpu, running the Qwen2.5-3B-Instruct model,
python3 ../run.py --input_text "Hello, what is your name?" --max_output_len=50 --tokenizer_dir ./tmp/Qwen/3B/ --engine_dir=./tmp/Qwen/3B/trt_engines/int4_weight_only/1-gpu/
I can get normal results.
But starting the backend service reported an error
python3 /tensorrtllm_backend/scripts/launch_triton_server.py --world_size=1 --model_repo=${MODEL_FOLDER}