Converting t5-3b plus size model to tensorrt
Description
Hello,
Everything works as expected for t5-large, but t5-3b ooms while building the engine. (using the code in the readme)
The network itself only consumes: 6448MiB / 15109MiB
I tried setting the workspace size really low and freeing the memory from the original plugins, but it didn't seem to help. Any advice would be appreciated.
Reproduced Steps
docker image: 22.05-py3
python ../examples/tensorrt/t5/extractT5ModelToBIN.py # get T5Model weight for test (need Internet)
CUDA_VISIBLE_DEVICES=0 python ../examples/tensorrt/t5/testT5Plugin.py \
--batch_size 1 \
--beam_width 4 \
--max_seq_len 16 \
--data_type fp16 \
--sampling_topk 1 \
--model t5-3b
I see you set data_type to fp32, which requires 12 GB to store the model. In such case, bs 32 + beam width 4 + sequence length 128 may be too large.
Sorry that was pasted from the readme I did use fp16, updated the issue
This is caused by loading the weights in the plugin constructor. Because TensorRT will clone multiple plugins during building engines, we load the weights multiple times now.
We will fix this bug in next release.
Appreciate the update.
Also can't thank you enough for this repo, it's already added so much value for us.
Appreciate the update.
Also can't thank you enough for this repo, it's already added so much value for us.
@chessgecko The issue is fixed in latest release. Thank you for the feedback.
Close this bug because it is inactivated. Feel free to re-open this issue if you still have any problem.