TensorRT-LLM
TensorRT-LLM copied to clipboard
Inference Qwen1.5-14B with 2x RTX4090D failed based main branch
inference with main branch and error log is as below:
]# mpirun -n 2 --allow-run-as-root /app/tensorrt_llm/benchmarks/cpp/gptSessionBenchmark --engine_dir ./examples/qwen/trtModel/fp16 --warm_up 2 --batch_size 1 --duration 0 --num_runs 3 --input_output_len 32,1 --log_level info
Your environment does not support peer access and you need to disable the use_custom_all_reduce during building engine.
thanks, it works. and what's mean about use_custom_all_reduce?
It use a customized all reduce kernel instead of NCCL all reduce API.