FasterTransformer Converting t5-3b plus size model to tensorrt

Description

Hello,

Everything works as expected for t5-large, but t5-3b ooms while building the engine. (using the code in the readme)

The network itself only consumes: 6448MiB / 15109MiB

I tried setting the workspace size really low and freeing the memory from the original plugins, but it didn't seem to help. Any advice would be appreciated.

Reproduced Steps

docker image: 22.05-py3

python ../examples/tensorrt/t5/extractT5ModelToBIN.py # get T5Model weight for test (need Internet)

CUDA_VISIBLE_DEVICES=0 python ../examples/tensorrt/t5/testT5Plugin.py \
        --batch_size 1 \
        --beam_width 4 \
        --max_seq_len 16 \
        --data_type fp16 \
        --sampling_topk 1 \
        --model t5-3b

Jun 16 '22 00:06 chessgecko

I see you set data_type to fp32, which requires 12 GB to store the model. In such case, bs 32 + beam width 4 + sequence length 128 may be too large.

Jun 16 '22 00:06 byshiue

Sorry that was pasted from the readme I did use fp16, updated the issue

Jun 16 '22 00:06 chessgecko

This is caused by loading the weights in the plugin constructor. Because TensorRT will clone multiple plugins during building engines, we load the weights multiple times now.

We will fix this bug in next release.

Jun 16 '22 03:06 byshiue

Appreciate the update.

Also can't thank you enough for this repo, it's already added so much value for us.

Jun 16 '22 18:06 chessgecko

Appreciate the update.

Also can't thank you enough for this repo, it's already added so much value for us.

@chessgecko The issue is fixed in latest release. Thank you for the feedback.

Aug 16 '22 03:08 byshiue

Close this bug because it is inactivated. Feel free to re-open this issue if you still have any problem.

Sep 08 '22 07:09 byshiue