TensorRT-LLM
TensorRT-LLM copied to clipboard
Cpp runner outputs wrong results when using lora + tensor parallelism
trafficstars
System Info
x86_64, debian 11, A100 GPUs
Who can help?
No response
Information
- [ ] The official example scripts
- [ ] My own modified scripts
Tasks
- [ ] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
On a VM with 2 A100 GPUs:
- pip3 install tensorrt_llm==0.16.0 --extra-index-url https://pypi.nvidia.com/
- git clone -b v0.16.0 https://github.com/NVIDIA/TensorRT-LLM.git
- git clone https://huggingface.co/unsloth/Llama-3.2-3B-Instruct
- git clone https://huggingface.co/ss-galileo/llama-3.2-3B-lora
- Run commands below to build the engine
python3 TensorRT-LLM/examples/llama/convert_checkpoint.py --model_dir ./Llama-3.2-3B-Instruct --output_dir ./tllm_3b_checkpoint_2gpu_fp16 --dtype float16 --tp_size=2
trtllm-build --checkpoint_dir ./tllm_3b_checkpoint_2gpu_fp16 --output_dir ./tmp/llama/3B/trt_engines/fp16/2-gpu --gpt_attention_plugin auto --gemm_plugin auto --max_num_tokens 128000 --max_batch_size 8 --logits_dtype=float32 --gather_generation_logits --kv_cache_type=paged --lora_plugin auto --lora_dir llama-3.2-3B-lora/
- Run model with cpp runner
mpirun -n 2 python3 TensorRT-LLM/examples/run.py --engine_dir=./tmp/llama/3B/trt_engines/fp16/2-gpu --max_output_len 100 --max_input_length=100000 --tokenizer_dir ./Llama-3.2-3B-Instruct --input_text "is sky blue?" --lora_dir llama-3.2-3B-lora/ --lora_task_uids 0
- Got results
Input [Text 0]: "<|begin_of_text|>is sky blue?"
Output [Text 0 Beam 0]: " 1.0. 1.0. 2. 1.0. 3. 1. 4. 1. 5. 1. 6. 1. 7. 1. 8. 1. 9. 1. 10. 1. 11. 1. 12. 1. 13. 1. 14. 1. 15. 1. 16. "
- Run model with python runner
mpirun -n 2 python3 TensorRT-LLM/examples/run.py --engine_dir=./tmp/llama/3B/trt_engines/fp16/2-gpu --max_output_len 100 --max_input_length=100000 --tokenizer_dir ./Llama-3.2-3B-Instruct --input_text "is sky blue?" --lora_dir llama-3.2-3B-lora/ --lora_task_uids 0 --use_py_session
- Got results
Input [Text 0]: "<|begin_of_text|>is sky blue?"
Output [Text 0 Beam 0]: " (a) yes (b) sky is not blue
The question is not about the color of the sky, but about the color of the sky at a particular time of day. The sky appears blue during the daytime, but it can appear different colors at sunrise and sunset. So, the correct answer is (b) sky is not blue.
This question requires the ability to analyze the situation and understand the context, which is a key aspect of critical thinking. It also requires the ability to distinguish"
Expected behavior
Python runner and cpp runner should give same results
actual behavior
Python runner and cpp runner give totally different results, and the results from cpp runners are apparently wrong
additional notes
N/A
I meet the same perblem that cpp runner get diffrent output compare with python runner。 but I don`t know why