TensorRT-LLM
TensorRT-LLM copied to clipboard
Unable to convert Llama-2-7b-chat-hf model to TensorRT-LLM engine
System Info
Instance Type: g5.12xlarge (vCPUs: 48, GPUs: 4, GPU Mem: 96GiB) (AWS Amazon SageMaker Notebook Instance)
GPU Family: NVIDIA A10G Tensor Core GPUs
OS: Amazon Linux 2
TensorRT-LLM: v0.7.1 (Stable)
CUDA Version: 12.2
Driver Version: 535.104.12
Who can help?
@byshiue @juney-nvidia
Information
- [X] The official example scripts
- [ ] My own modified scripts
Tasks
- [X] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
- Download model weights from HF
git lfs install
git clone https://huggingface.co/meta-llama/Llama-2-7b-chat-hf
- Download nvidia devel container using
nvidia-docker. Mount model weights into/Llama-2-7b-chat-hffolder inside container
nvidia-docker run \
-v $(pwd)/Llama-2-7b-chat-hf:/Llama-2-7b-chat-hf \
--name trtllm \
--entrypoint /bin/bash -it nvidia/cuda:12.1.0-devel-ubuntu22.04
- Install
tensorrt_llmpython package inside the container
# Inside the container
apt-get update && apt-get -y install python3.10 python3-pip openmpi-bin libopenmpi-dev
# Install TensorRT-LLM package
pip3 install tensorrt_llm -U --extra-index-url https://pypi.nvidia.com
- Verify installation of
tensorrt_llmpackage
# verify installation
python3 -c "import tensorrt_llm"
NOTE: This results in the error as described in Issue 808. Fix this by running this command
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu121
- Clone
TensorRT-LLMgithub repo so that we can useexamplesfolder
# Clone TensorRT-LLM github repo
git clone https://github.com/NVIDIA/TensorRT-LLM.git
cd TensorRT-LLM
git submodule update --init --recursive
- Call
convert_checkpoint.pyinsideTensorRT-LLM/examples/llamafolder
cd TensorRT-LLM
pip install -r requirements.txt
cd examples/llama
pip install -r requirements.txt
python3 convert_checkpoint.py --model_dir /Llama-2-7b-chat-hf/ --output_dir ./tllm_1gpu_fp16 --dtype float16
ERROR:
Traceback (most recent call last):
File "/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 27, in <module>
from tensorrt_llm.models.llama.weight import (load_from_fp8_llama,
ModuleNotFoundError: No module named 'tensorrt_llm.models.llama.weight'
Expected behavior
The below command successfully converts the model weights to TensorRT-LLM format.
python3 convert_checkpoint.py --model_dir /Llama-2-7b-chat-hf/ --output_dir ./tllm_1gpu_fp16 --dtype float16
actual behavior
python3 convert_checkpoint.py --model_dir /Llama-2-7b-chat-hf/ --output_dir ./tllm_1gpu_fp16 --dtype float16
ERROR:
Traceback (most recent call last):
File "/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 27, in <module>
from tensorrt_llm.models.llama.weight import (load_from_fp8_llama,
ModuleNotFoundError: No module named 'tensorrt_llm.models.llama.weight'
additional notes
python3 convert_checkpoint.py --model_dir /Llama-2-7b-chat-hf/ --output_dir ./tllm_1gpu_fp16 --dtype float16
ERROR:
Traceback (most recent call last):
File "/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 27, in <module>
from tensorrt_llm.models.llama.weight import (load_from_fp8_llama,
ModuleNotFoundError: No module named 'tensorrt_llm.models.llama.weight'