TensorRT-LLM Inference Qwen1.5-14B with 2x RTX4090D failed based main branch

Inference Qwen1.5-14B with 2x RTX4090D failed based main branch

Open Fred-cell opened this issue 1 year ago • 2 comments

inference with main branch and error log is as below: ]# mpirun -n 2 --allow-run-as-root /app/tensorrt_llm/benchmarks/cpp/gptSessionBenchmark --engine_dir ./examples/qwen/trtModel/fp16 --warm_up 2 --batch_size 1 --duration 0 --num_runs 3 --input_output_len 32,1 --log_level info

May 13 '24 10:05 Fred-cell

Your environment does not support peer access and you need to disable the use_custom_all_reduce during building engine.

May 15 '24 03:05 byshiue

thanks, it works. and what's mean about use_custom_all_reduce?

May 19 '24 04:05 Fred-cell

It use a customized all reduce kernel instead of NCCL all reduce API.

May 23 '24 07:05 byshiue

TensorRT-LLM TensorRT-LLM copied to clipboard

Inference Qwen1.5-14B with 2x RTX4090D failed based main branch

TensorRT-LLM
TensorRT-LLM copied to clipboard