TensorRT-LLM icon indicating copy to clipboard operation
TensorRT-LLM copied to clipboard

KeyError: 6 when getting nvlink_bandwidth

Open choyuansu opened this issue 10 months ago • 1 comments

System Info

GPU: NVIDIA RTX A6000

Who can help?

@Tracin

Information

  • [x] The official example scripts
  • [ ] My own modified scripts

Tasks

  • [x] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • [ ] My own task or dataset (give details below)

Reproduction

  1. Run git clone https://github.com/NVIDIA/TensorRT-LLM.git
  2. Create Dockerfile and docker-compose.yaml in TensorRT-LLM/
    Dockerfile
    # Obtain and start the basic docker image environment.
    FROM nvidia/cuda:12.1.0-devel-ubuntu22.04
    
    # Install dependencies, TensorRT-LLM requires Python 3.10
    RUN apt-get update && apt-get -y install \
        python3.10 \
        python3-pip \
        openmpi-bin \
        libopenmpi-dev
    
    # Install the latest preview version (corresponding to the main branch) of TensorRT-LLM.
    # If you want to install the stable version (corresponding to the release branch), please
    # remove the `--pre` option.
    RUN --mount=type=cache,target=/root/.cache/pip pip3 install tensorrt_llm -U --pre --extra-index-url https://pypi.nvidia.com
    
    COPY ./examples/qwen/requirements.txt .
    RUN --mount=type=cache,target=/root/.cache/pip pip3 install -r requirements.txt
    
    WORKDIR /workdir
    
    docker-compose.yaml
    services:
      tensorrt:
        image: tensorrt-llm
        volumes:
          - .:/workdir
          - /mnt/models:/mnt/models
        command:
        - bash
        - -ec
        - |
          cd examples/qwen
          pip install -r requirements.txt
          python3 convert_checkpoint.py --model_dir /mnt/models/Large_Language_Model/Qwen-7B-Chat/ \
                    --dtype float32 \
                    --output_dir /mnt/models/Large_Language_Model/Qwen-7B-Chat/trt_ckpt/fp32/1-gpu/
          trtllm-build --checkpoint_dir /mnt/models/Large_Language_Model/Qwen-7B-Chat/trt_ckpt/fp32/1-gpu/ \
                    --gemm_plugin float32 \
                    --output_dir /mnt/models/Large_Language_Model/Qwen-7B-Chat/trt_engines/fp32/1-gpu/
        deploy:
            resources:
              reservations:
                devices:
                  - driver: nvidia
                    count: 1
                    capabilities: [gpu]
    
  3. Run git clone https://huggingface.co/Qwen/Qwen-7B-Chat in /mnt/models/Large_Language_Model
  4. Run docker compose up

Expected behavior

No error

actual behavior

[04/16/2024-22:50:23] [TRT-LLM] [I] NVLink is active: True
[04/16/2024-22:50:23] [TRT-LLM] [I] NVLink version: 6
Traceback (most recent call last):
  File "/usr/local/bin/trtllm-build", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/commands/build.py", line 411, in main
    cluster_config = infer_cluster_config()
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/auto_parallel/cluster_info.py", line 523, in infer_cluster_config
    cluster_info=infer_cluster_info(),
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/auto_parallel/cluster_info.py", line 487, in infer_cluster_info
    nvl_bw = nvlink_bandwidth(nvl_version)
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/auto_parallel/cluster_info.py", line 433, in nvlink_bandwidth
    return nvl_bw_table[nvlink_version]
KeyError: 6

additional notes

Relevant code: https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/auto_parallel/cluster_info.py#L427-L433

Can't seem to find info about NVLink version 6's bandwidth online.

choyuansu avatar Apr 17 '24 22:04 choyuansu

Thank you for the report. We will fix it in next update.

byshiue avatar Apr 22 '24 07:04 byshiue