Unable to convert Llama-2-7b-chat-hf model to TensorRT-LLM engine

Open praveenc opened this issue 1 year ago • 0 comments

System Info

Instance Type: g5.12xlarge (vCPUs: 48, GPUs: 4, GPU Mem: 96GiB) (AWS Amazon SageMaker Notebook Instance) GPU Family: NVIDIA A10G Tensor Core GPUs OS: Amazon Linux 2 TensorRT-LLM: v0.7.1 (Stable) CUDA Version: 12.2 Driver Version: 535.104.12

Who can help?

@byshiue @juney-nvidia

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

Download model weights from HF

git lfs install
git clone https://huggingface.co/meta-llama/Llama-2-7b-chat-hf

Download nvidia devel container using nvidia-docker. Mount model weights into /Llama-2-7b-chat-hf folder inside container

nvidia-docker run \
    -v $(pwd)/Llama-2-7b-chat-hf:/Llama-2-7b-chat-hf \
    --name trtllm \
    --entrypoint /bin/bash -it nvidia/cuda:12.1.0-devel-ubuntu22.04

Install tensorrt_llm python package inside the container

# Inside the container
apt-get update && apt-get -y install python3.10 python3-pip openmpi-bin libopenmpi-dev

# Install TensorRT-LLM package
pip3 install tensorrt_llm -U --extra-index-url https://pypi.nvidia.com

Verify installation of tensorrt_llm package

# verify installation
python3 -c "import tensorrt_llm"

NOTE: This results in the error as described in Issue 808. Fix this by running this command pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu121

Clone TensorRT-LLM github repo so that we can use examples folder

# Clone TensorRT-LLM github repo
git clone https://github.com/NVIDIA/TensorRT-LLM.git
cd TensorRT-LLM
git submodule update --init --recursive

Call convert_checkpoint.py inside TensorRT-LLM/examples/llama folder

cd TensorRT-LLM
pip install -r requirements.txt
cd examples/llama
pip install -r requirements.txt
python3 convert_checkpoint.py --model_dir /Llama-2-7b-chat-hf/ --output_dir ./tllm_1gpu_fp16 --dtype float16

ERROR:

Traceback (most recent call last):
  File "/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 27, in <module>
    from tensorrt_llm.models.llama.weight import (load_from_fp8_llama,
ModuleNotFoundError: No module named 'tensorrt_llm.models.llama.weight'

Expected behavior

The below command successfully converts the model weights to TensorRT-LLM format.

python3 convert_checkpoint.py --model_dir /Llama-2-7b-chat-hf/ --output_dir ./tllm_1gpu_fp16 --dtype float16

actual behavior


python3 convert_checkpoint.py --model_dir /Llama-2-7b-chat-hf/ --output_dir ./tllm_1gpu_fp16 --dtype float16

ERROR:

Traceback (most recent call last):
  File "/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 27, in <module>
    from tensorrt_llm.models.llama.weight import (load_from_fp8_llama,
ModuleNotFoundError: No module named 'tensorrt_llm.models.llama.weight'

additional notes

python3 convert_checkpoint.py --model_dir /Llama-2-7b-chat-hf/ --output_dir ./tllm_1gpu_fp16 --dtype float16

ERROR:

Traceback (most recent call last):
  File "/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 27, in <module>
    from tensorrt_llm.models.llama.weight import (load_from_fp8_llama,
ModuleNotFoundError: No module named 'tensorrt_llm.models.llama.weight'

Feb 13 '24 20:02 praveenc