TensorRT-LLM Bus error running t5 conversion script using the latest main

System Info

GPU (a10g). I have tried with an AWS g5.2xlarge instance and AWS g5.12xlarge instance.

Who can help?

@byshiue

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

I pretty much follow the official installation:

docker run --shm-size=2g --rm --runtime=nvidia --GPUs all --entrypoint /bin/bash -it nvidia/cuda:12.1.0-devel-ubuntu22.04
apt-get update && apt-get -y install python3.10 python3-pip openmpi-bin libopenmpi-dev git python-is-python3 vim
pip3 install tensorrt_llm -U --pre --extra-index-url https://pypi.nvidia.com
git clone https://github.com/NVIDIA/TensorRT-LLM.git (05/02 version)
cd TensorRT-LLM

export MODEL_TYPE="t5"
export MODEL_NAME="google/flan-t5-large"
export INFERENCE_PRECISION="float32"
export TP_SIZE=1
export PP_SIZE=1
export WORLD_SIZE=1

python examples/enc_dec/convert_checkpoint.py --model_type ${MODEL_TYPE}   
              --model_dir ${MODEL_NAME}         
        --output_dir tmp/trt_models/${MODEL_NAME}/${INFERENCE_PRECISION}        
        --tp_size ${TP_SIZE}            
       --pp_size ${PP_SIZE}             
       --weight_data_type float32            
       --dtype ${INFERENCE_PRECISION}

Expected behavior

Model converted

actual behavior

[TensorRT-LLM] TensorRT-LLM version: 0.10.0.dev2024043000
/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
/usr/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 2 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
Bus error (core dumped)

additional notes

I also tried to use bart model with the same script, and it successfully exits. Just change to export MODEL_TYPE="bart" export MODEL_NAME="facebook/bart-large-cnn". So this might be a t5 architecture only problem, or it could relate to the GPU type I'm using (a10g)

May 03 '24 04:05 sc-gr

Also running into the same issue but on a single A100 80 GB gpu.

May 06 '24 19:05 aravindMahadevan

Have a same issue on the single A100 80 GB, converting a t5

May 08 '24 13:05 TeamSeshDeadBoy

investgating

May 08 '24 15:05 symphonylyh

Hi @sc-gr @aravindMahadevan @TeamSeshDeadBoy,

I am able to reproduce the error with the reproduction step. In short, the reason is that python multiprocessing memory size is limited by docker run. I've tested out myself, on the same A100 node, run the docker run command with the command below, the rest can be the same, solves the problem: docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 --entrypoint /bin/bash --runtime=nvidia -it nvidia/cuda:12.1.0-devel-ubuntu22.04

The official installation guide is a faster way to install but may not support well on larger model (in this case, a python multiprocessing error). The official build from source guide can be more reliable as it builds directly from the latest cloned repository instead of pip packages.

Thanks!

May 08 '24 18:05 jhaotingc

It works after adding --ulimit memlock=-1 --ulimit stack=67108864, thanks!

May 09 '24 00:05 sc-gr

Update: The bus error is triggered by not adding --ipc=host argument. docker run -it --gpus=all --ipc=host --shm-size=2g --ulimit memlock=-1 --ulimit stack=67108864 --runtime=nvidia [DOCKER_IMAGE] bash This would not trigger bus error.

Jun 06 '24 18:06 jhaotingc

TensorRT-LLM TensorRT-LLM copied to clipboard

Bus error running t5 conversion script using the latest main

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

TensorRT-LLM
TensorRT-LLM copied to clipboard