TensorRT-LLM
TensorRT-LLM copied to clipboard
Bus error running t5 conversion script using the latest main
System Info
GPU (a10g). I have tried with an AWS g5.2xlarge instance and AWS g5.12xlarge instance.
Who can help?
@byshiue
Information
- [X] The official example scripts
- [ ] My own modified scripts
Tasks
- [X] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
I pretty much follow the official installation:
- docker run --shm-size=2g --rm --runtime=nvidia --GPUs all --entrypoint /bin/bash -it nvidia/cuda:12.1.0-devel-ubuntu22.04
- apt-get update && apt-get -y install python3.10 python3-pip openmpi-bin libopenmpi-dev git python-is-python3 vim
- pip3 install tensorrt_llm -U --pre --extra-index-url https://pypi.nvidia.com
- git clone https://github.com/NVIDIA/TensorRT-LLM.git (05/02 version)
- cd TensorRT-LLM
export MODEL_TYPE="t5"
export MODEL_NAME="google/flan-t5-large"
export INFERENCE_PRECISION="float32"
export TP_SIZE=1
export PP_SIZE=1
export WORLD_SIZE=1
python examples/enc_dec/convert_checkpoint.py --model_type ${MODEL_TYPE}
--model_dir ${MODEL_NAME}
--output_dir tmp/trt_models/${MODEL_NAME}/${INFERENCE_PRECISION}
--tp_size ${TP_SIZE}
--pp_size ${PP_SIZE}
--weight_data_type float32
--dtype ${INFERENCE_PRECISION}
Expected behavior
Model converted
actual behavior
[TensorRT-LLM] TensorRT-LLM version: 0.10.0.dev2024043000
/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
/usr/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 2 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
Bus error (core dumped)
additional notes
I also tried to use bart model with the same script, and it successfully exits. Just change to export MODEL_TYPE="bart" export MODEL_NAME="facebook/bart-large-cnn". So this might be a t5 architecture only problem, or it could relate to the GPU type I'm using (a10g)
Also running into the same issue but on a single A100 80 GB gpu.
Have a same issue on the single A100 80 GB, converting a t5
investgating
Hi @sc-gr @aravindMahadevan @TeamSeshDeadBoy,
I am able to reproduce the error with the reproduction step. In short, the reason is that python multiprocessing memory size is limited by docker run.
I've tested out myself, on the same A100 node, run the docker run command with the command below, the rest can be the same, solves the problem:
docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 --entrypoint /bin/bash --runtime=nvidia -it nvidia/cuda:12.1.0-devel-ubuntu22.04
The official installation guide is a faster way to install but may not support well on larger model (in this case, a python multiprocessing error). The official build from source guide can be more reliable as it builds directly from the latest cloned repository instead of pip packages.
Thanks!
It works after adding --ulimit memlock=-1 --ulimit stack=67108864, thanks!
Update: The bus error is triggered by not adding --ipc=host argument.
docker run -it --gpus=all --ipc=host --shm-size=2g --ulimit memlock=-1 --ulimit stack=67108864 --runtime=nvidia [DOCKER_IMAGE] bash
This would not trigger bus error.