TensorRT-LLM icon indicating copy to clipboard operation
TensorRT-LLM copied to clipboard

Cpp runner outputs wrong results when using lora + tensor parallelism

Open ShuaiShao93 opened this issue 10 months ago • 1 comments
trafficstars

System Info

x86_64, debian 11, A100 GPUs

Who can help?

No response

Information

  • [ ] The official example scripts
  • [ ] My own modified scripts

Tasks

  • [ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • [ ] My own task or dataset (give details below)

Reproduction

On a VM with 2 A100 GPUs:

  1. pip3 install tensorrt_llm==0.16.0 --extra-index-url https://pypi.nvidia.com/
  2. git clone -b v0.16.0 https://github.com/NVIDIA/TensorRT-LLM.git
  3. git clone https://huggingface.co/unsloth/Llama-3.2-3B-Instruct
  4. git clone https://huggingface.co/ss-galileo/llama-3.2-3B-lora
  5. Run commands below to build the engine
python3 TensorRT-LLM/examples/llama/convert_checkpoint.py --model_dir ./Llama-3.2-3B-Instruct --output_dir ./tllm_3b_checkpoint_2gpu_fp16 --dtype float16 --tp_size=2

trtllm-build --checkpoint_dir ./tllm_3b_checkpoint_2gpu_fp16 --output_dir ./tmp/llama/3B/trt_engines/fp16/2-gpu  --gpt_attention_plugin auto  --gemm_plugin auto --max_num_tokens 128000 --max_batch_size 8 --logits_dtype=float32 --gather_generation_logits --kv_cache_type=paged --lora_plugin auto  --lora_dir llama-3.2-3B-lora/
  1. Run model with cpp runner
mpirun -n 2 python3 TensorRT-LLM/examples/run.py --engine_dir=./tmp/llama/3B/trt_engines/fp16/2-gpu --max_output_len 100 --max_input_length=100000 --tokenizer_dir ./Llama-3.2-3B-Instruct --input_text "is sky blue?" --lora_dir llama-3.2-3B-lora/ --lora_task_uids 0
  1. Got results
Input [Text 0]: "<|begin_of_text|>is sky blue?"
Output [Text 0 Beam 0]: " 1.0. 1.0. 2. 1.0. 3. 1. 4. 1. 5. 1. 6. 1. 7. 1. 8. 1. 9. 1. 10. 1. 11. 1. 12. 1. 13. 1. 14. 1. 15. 1. 16. "
  1. Run model with python runner
mpirun -n 2 python3 TensorRT-LLM/examples/run.py --engine_dir=./tmp/llama/3B/trt_engines/fp16/2-gpu --max_output_len 100 --max_input_length=100000 --tokenizer_dir ./Llama-3.2-3B-Instruct --input_text "is sky blue?" --lora_dir llama-3.2-3B-lora/ --lora_task_uids 0 --use_py_session
  1. Got results
Input [Text 0]: "<|begin_of_text|>is sky blue?"
Output [Text 0 Beam 0]: " (a) yes (b) sky is not blue
The question is not about the color of the sky, but about the color of the sky at a particular time of day. The sky appears blue during the daytime, but it can appear different colors at sunrise and sunset. So, the correct answer is (b) sky is not blue.
This question requires the ability to analyze the situation and understand the context, which is a key aspect of critical thinking. It also requires the ability to distinguish"

Expected behavior

Python runner and cpp runner should give same results

actual behavior

Python runner and cpp runner give totally different results, and the results from cpp runners are apparently wrong

additional notes

N/A

ShuaiShao93 avatar Dec 28 '24 00:12 ShuaiShao93

I meet the same perblem that cpp runner get diffrent output compare with python runner。 but I don`t know why

HPUedCSLearner avatar Mar 28 '25 09:03 HPUedCSLearner