TensorRT-LLM
TensorRT-LLM copied to clipboard
KeyError: 6 when getting nvlink_bandwidth
System Info
GPU: NVIDIA RTX A6000
Who can help?
@Tracin
Information
- [x] The official example scripts
- [ ] My own modified scripts
Tasks
- [x] An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
- Run
git clone https://github.com/NVIDIA/TensorRT-LLM.git
- Create
Dockerfile
anddocker-compose.yaml
inTensorRT-LLM/
Dockerfile
# Obtain and start the basic docker image environment. FROM nvidia/cuda:12.1.0-devel-ubuntu22.04 # Install dependencies, TensorRT-LLM requires Python 3.10 RUN apt-get update && apt-get -y install \ python3.10 \ python3-pip \ openmpi-bin \ libopenmpi-dev # Install the latest preview version (corresponding to the main branch) of TensorRT-LLM. # If you want to install the stable version (corresponding to the release branch), please # remove the `--pre` option. RUN --mount=type=cache,target=/root/.cache/pip pip3 install tensorrt_llm -U --pre --extra-index-url https://pypi.nvidia.com COPY ./examples/qwen/requirements.txt . RUN --mount=type=cache,target=/root/.cache/pip pip3 install -r requirements.txt WORKDIR /workdir
docker-compose.yaml
services: tensorrt: image: tensorrt-llm volumes: - .:/workdir - /mnt/models:/mnt/models command: - bash - -ec - | cd examples/qwen pip install -r requirements.txt python3 convert_checkpoint.py --model_dir /mnt/models/Large_Language_Model/Qwen-7B-Chat/ \ --dtype float32 \ --output_dir /mnt/models/Large_Language_Model/Qwen-7B-Chat/trt_ckpt/fp32/1-gpu/ trtllm-build --checkpoint_dir /mnt/models/Large_Language_Model/Qwen-7B-Chat/trt_ckpt/fp32/1-gpu/ \ --gemm_plugin float32 \ --output_dir /mnt/models/Large_Language_Model/Qwen-7B-Chat/trt_engines/fp32/1-gpu/ deploy: resources: reservations: devices: - driver: nvidia count: 1 capabilities: [gpu]
- Run
git clone https://huggingface.co/Qwen/Qwen-7B-Chat
in/mnt/models/Large_Language_Model
- Run
docker compose up
Expected behavior
No error
actual behavior
[04/16/2024-22:50:23] [TRT-LLM] [I] NVLink is active: True
[04/16/2024-22:50:23] [TRT-LLM] [I] NVLink version: 6
Traceback (most recent call last):
File "/usr/local/bin/trtllm-build", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/commands/build.py", line 411, in main
cluster_config = infer_cluster_config()
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/auto_parallel/cluster_info.py", line 523, in infer_cluster_config
cluster_info=infer_cluster_info(),
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/auto_parallel/cluster_info.py", line 487, in infer_cluster_info
nvl_bw = nvlink_bandwidth(nvl_version)
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/auto_parallel/cluster_info.py", line 433, in nvlink_bandwidth
return nvl_bw_table[nvlink_version]
KeyError: 6
additional notes
Relevant code: https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/auto_parallel/cluster_info.py#L427-L433
Can't seem to find info about NVLink version 6's bandwidth online.
Thank you for the report. We will fix it in next update.