TensorRT
TensorRT copied to clipboard
How should I speed up T5 original exported saved_model by using TRT?
THE ISSUES SECTION IS ONLY FOR FILING BUGS. PLEASE ASK YOUR QUESTION ON THE DISCUSSION TAB. My env:
Docker image: nvcr.io/nvidia/tensorflow:22.05-tf2-py3, TRT: 8.2.5.1, CUDA: 11.7 tf 2.8 GPU: Tesla V100
The original saved_model tooks 300ms when batch_size=32 and sen_length=128, it's too long for deploy. So I wanted to speed up t5 by using tf-trt. But when I convert saved_model using below code, tf-trt doesn't work:
from tensorflow.python.compiler.tensorrt import trt_convert as trt
import numpy as np
import tensorflow_text
import tensorflow as tf
tf.compat.v1.disable_v2_behavior()
input_saved_model_dir = 'exported_model/batch32_length128_0810/1660123651'
output_saved_model_dir = 'trt_saved_model/batch32_length128_0810/1/'
converter = trt.TrtGraphConverter(
input_saved_model_dir=input_saved_model_dir,
max_workspace_size_bytes=(11<32),
max_batch_size=32,
minimum_segment_size=50,
precision_mode='FP32',
is_dynamic_op=True,
maximum_cached_engines=1)
converter.convert()
converter.save(output_saved_model_dir)
Before using the code, you should add some code in tensorflow/python/compiler/tensorrt/trt_convert.py. The reference is here Could some body help me about this?
@nvpohanh could you please look at this?
I think TRT should have good out-of-box performance for T5 now. can you try to export it to onnx and check the throughput?
Also TRT 8.4 should have better performance
I think TRT should have good out-of-box performance for T5 now. can you try to export it to onnx and check the throughput?
@zerollzeng thanks for your answer. Dose "export it to onnx" mean using coverted saved_model export to onnx?
The normal way is using HuggingFace Pytorch t5 model to speed up t5.ipynb :
- convert HF Pytorch t5 to onnx
- convert t5-onnx model to trt
- I want a saved_model so I want convert trt-engine to saved_model (not sure about this)
The way I used is:
- export T5 model to saved_model using export method in reop google-research/T5. The exported saved_model is TF1.X.
- convert saved_model to trt using code I post in first floor
I'm not sure my way is the right way to speed up t5 after i've faild. Could I get a speed up t5 saved_model using original exported t5 model ? Or using this repo code tensorrt/t5 to get a speed up saved_model Thanks again
Dose "export it to onnx" mean using coverted saved_model export to onnx?
Yes
BTW I used https://huggingface.co/docs/transformers/model_doc/t5 before and I think it's more convenient to export to onnx :-)
Thanks you.
I think it's more convenient to export to onnx
Yes, I think so. But unfortunately, I need a saved_model to deploy service. I tried
Or using this repo code tensorrt/t5 to get a speed up saved_model
The speed in {V100 & batch_size 32 & seq_len 128 & trt 8.2.5 } is 107ms. What's your suggestion about T5 speed up? The {HF t5 exported + onnx} way or {saved_model + tf-trt} way ? Which do you think is better?
{HF t5 exported + onnx} would be better IMHO, AFAIK when you deploy using TF-TRT, there are inevitable framework overheads introduced by the conversion of TF IR to TRT IR. so use onnx is more preferred.
Thanks for your suggestion. I have some questions to discuss with you. Can I get your other contact details? email, work IM or something else? It would be nice if you can contact me. My email is [email protected]. Looking forward to your message. Thanks again.
closing since no activity for more than 14 days, please reopen if you still have question, thanks!