TensorRT-LLM KeyError: 6 when getting nvlink

KeyError: 6 when getting nvlink_bandwidth

Open choyuansu opened this issue 10 months ago • 1 comments

System Info

GPU: NVIDIA RTX A6000

Who can help?

@Tracin

Information

[x] The official example scripts
[ ] My own modified scripts

Tasks

[x] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

Run git clone https://github.com/NVIDIA/TensorRT-LLM.git

Create Dockerfile and docker-compose.yaml in TensorRT-LLM/

Dockerfile

# Obtain and start the basic docker image environment.
FROM nvidia/cuda:12.1.0-devel-ubuntu22.04

# Install dependencies, TensorRT-LLM requires Python 3.10
RUN apt-get update && apt-get -y install \
    python3.10 \
    python3-pip \
    openmpi-bin \
    libopenmpi-dev

# Install the latest preview version (corresponding to the main branch) of TensorRT-LLM.
# If you want to install the stable version (corresponding to the release branch), please
# remove the `--pre` option.
RUN --mount=type=cache,target=/root/.cache/pip pip3 install tensorrt_llm -U --pre --extra-index-url https://pypi.nvidia.com

COPY ./examples/qwen/requirements.txt .
RUN --mount=type=cache,target=/root/.cache/pip pip3 install -r requirements.txt

WORKDIR /workdir

docker-compose.yaml

services:
  tensorrt:
    image: tensorrt-llm
    volumes:
      - .:/workdir
      - /mnt/models:/mnt/models
    command:
    - bash
    - -ec
    - |
      cd examples/qwen
      pip install -r requirements.txt
      python3 convert_checkpoint.py --model_dir /mnt/models/Large_Language_Model/Qwen-7B-Chat/ \
                --dtype float32 \
                --output_dir /mnt/models/Large_Language_Model/Qwen-7B-Chat/trt_ckpt/fp32/1-gpu/
      trtllm-build --checkpoint_dir /mnt/models/Large_Language_Model/Qwen-7B-Chat/trt_ckpt/fp32/1-gpu/ \
                --gemm_plugin float32 \
                --output_dir /mnt/models/Large_Language_Model/Qwen-7B-Chat/trt_engines/fp32/1-gpu/
    deploy:
        resources:
          reservations:
            devices:
              - driver: nvidia
                count: 1
                capabilities: [gpu]

Run git clone https://huggingface.co/Qwen/Qwen-7B-Chat in /mnt/models/Large_Language_Model
Run docker compose up

Expected behavior

No error

actual behavior

[04/16/2024-22:50:23] [TRT-LLM] [I] NVLink is active: True
[04/16/2024-22:50:23] [TRT-LLM] [I] NVLink version: 6
Traceback (most recent call last):
  File "/usr/local/bin/trtllm-build", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/commands/build.py", line 411, in main
    cluster_config = infer_cluster_config()
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/auto_parallel/cluster_info.py", line 523, in infer_cluster_config
    cluster_info=infer_cluster_info(),
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/auto_parallel/cluster_info.py", line 487, in infer_cluster_info
    nvl_bw = nvlink_bandwidth(nvl_version)
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/auto_parallel/cluster_info.py", line 433, in nvlink_bandwidth
    return nvl_bw_table[nvlink_version]
KeyError: 6

additional notes

Relevant code: https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/auto_parallel/cluster_info.py#L427-L433

Can't seem to find info about NVLink version 6's bandwidth online.

Apr 17 '24 22:04 choyuansu

Thank you for the report. We will fix it in next update.

Apr 22 '24 07:04 byshiue

TensorRT-LLM TensorRT-LLM copied to clipboard

KeyError: 6 when getting nvlink_bandwidth

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

TensorRT-LLM
TensorRT-LLM copied to clipboard