TensorRT icon indicating copy to clipboard operation
TensorRT copied to clipboard

How should I speed up T5 original exported saved_model by using TRT?

Open chenl-chief opened this issue 3 years ago • 8 comments

THE ISSUES SECTION IS ONLY FOR FILING BUGS. PLEASE ASK YOUR QUESTION ON THE DISCUSSION TAB. My env:

Docker image: nvcr.io/nvidia/tensorflow:22.05-tf2-py3, TRT: 8.2.5.1, CUDA: 11.7 tf 2.8 GPU: Tesla V100

The original saved_model tooks 300ms when batch_size=32 and sen_length=128, it's too long for deploy. So I wanted to speed up t5 by using tf-trt. But when I convert saved_model using below code, tf-trt doesn't work:

from tensorflow.python.compiler.tensorrt import trt_convert as trt
import numpy as np
import tensorflow_text
import tensorflow as tf

tf.compat.v1.disable_v2_behavior()

input_saved_model_dir = 'exported_model/batch32_length128_0810/1660123651'
output_saved_model_dir = 'trt_saved_model/batch32_length128_0810/1/'
converter = trt.TrtGraphConverter(
    input_saved_model_dir=input_saved_model_dir,
    max_workspace_size_bytes=(11<32),
    max_batch_size=32,
    minimum_segment_size=50,
    precision_mode='FP32',
    is_dynamic_op=True,
    maximum_cached_engines=1)

converter.convert()
converter.save(output_saved_model_dir)

Before using the code, you should add some code in tensorflow/python/compiler/tensorrt/trt_convert.py. The reference is here Could some body help me about this?

chenl-chief avatar Aug 11 '22 03:08 chenl-chief

@nvpohanh could you please look at this?

chenl-chief avatar Aug 11 '22 07:08 chenl-chief

I think TRT should have good out-of-box performance for T5 now. can you try to export it to onnx and check the throughput?

zerollzeng avatar Aug 11 '22 12:08 zerollzeng

Also TRT 8.4 should have better performance

zerollzeng avatar Aug 11 '22 12:08 zerollzeng

I think TRT should have good out-of-box performance for T5 now. can you try to export it to onnx and check the throughput?

@zerollzeng thanks for your answer. Dose "export it to onnx" mean using coverted saved_model export to onnx?

The normal way is using HuggingFace Pytorch t5 model to speed up t5.ipynb :

  1. convert HF Pytorch t5 to onnx
  2. convert t5-onnx model to trt
  3. I want a saved_model so I want convert trt-engine to saved_model (not sure about this)

The way I used is:

  1. export T5 model to saved_model using export method in reop google-research/T5. The exported saved_model is TF1.X.
  2. convert saved_model to trt using code I post in first floor

I'm not sure my way is the right way to speed up t5 after i've faild. Could I get a speed up t5 saved_model using original exported t5 model ? Or using this repo code tensorrt/t5 to get a speed up saved_model Thanks again

chenl-chief avatar Aug 12 '22 03:08 chenl-chief

Dose "export it to onnx" mean using coverted saved_model export to onnx?

Yes

BTW I used https://huggingface.co/docs/transformers/model_doc/t5 before and I think it's more convenient to export to onnx :-)

zerollzeng avatar Aug 12 '22 09:08 zerollzeng

Thanks you.

I think it's more convenient to export to onnx

Yes, I think so. But unfortunately, I need a saved_model to deploy service. I tried

Or using this repo code tensorrt/t5 to get a speed up saved_model

The speed in {V100 & batch_size 32 & seq_len 128 & trt 8.2.5 } is 107ms. What's your suggestion about T5 speed up? The {HF t5 exported + onnx} way or {saved_model + tf-trt} way ? Which do you think is better?

chenl-chief avatar Aug 12 '22 11:08 chenl-chief

{HF t5 exported + onnx} would be better IMHO, AFAIK when you deploy using TF-TRT, there are inevitable framework overheads introduced by the conversion of TF IR to TRT IR. so use onnx is more preferred.

zerollzeng avatar Aug 12 '22 16:08 zerollzeng

Thanks for your suggestion. I have some questions to discuss with you. Can I get your other contact details? email, work IM or something else? It would be nice if you can contact me. My email is [email protected]. Looking forward to your message. Thanks again.

chenl-chief avatar Aug 12 '22 18:08 chenl-chief

closing since no activity for more than 14 days, please reopen if you still have question, thanks!

ttyio avatar Dec 12 '22 07:12 ttyio