FasterTransformer Add TensorFlow Ops for T5

Description

I saw you recently added T5 kernels for PyTorch here: https://github.com/NVIDIA/FasterTransformer/tree/main/src/fastertransformer/th_op/t5
Would it be possible to also add corresponding Ops for TensorFlow?

Reproduced Steps

See description.

May 31 '22 20:05 vlasenkoalexey

Thanks for your feedback, we will consider it.

Jun 01 '22 11:06 byshiue

@byshiue will the FT op be in the roadmap for the next release? TF op turns out to be faster than th op from the decoder(decoding) benchmark and is easier for production usage. Thanks.

Aug 24 '22 22:08 gyin94

@byshiue will the FT op be in the roadmap for the next release? TF op turns out to be faster than th op from the decoder(decoding) benchmark and is easier for production usage. Thanks.

Thank you, we will consider.

But for benchmark on TF op and PyTorch op, their performance are similar in FT. For some cases, FT TF op is faster, for some cases, FT PyTorch op is faster.

And for production usage, we recommend trying the triton backend, which helps batch the request and support several frameworks as backend.

Aug 25 '22 00:08 byshiue

@byshiue will the FT op be in the roadmap for the next release? TF op turns out to be faster than th op from the decoder(decoding) benchmark and is easier for production usage. Thanks.

Thank you, we will consider.

But for benchmark on TF op and PyTorch op, their performance are similar in FT. For some cases, FT TF op is faster, for some cases, FT PyTorch op is faster.

And for production usage, we recommend trying the triton backend, which helps batch the request and support several frameworks as backend.

Yes. The FT native triton backend is very cool if the usage is only the generation. one advantage of using TF op is that we can combine it with other TF ops and export it as a whole TF model. We found it very useful since adding this custom TF op in triton tf backend or tfserving is also straightforward and easy (with auto batching as well). In addition, TF model can have multiple signatures for the same model which is not yet supported by PyTorch or native FT backend. The use case can be like the different beam search size for different signatures. Btw, that dynamic beam search is super cool in FT v5.

Aug 25 '22 09:08 gyin94

This feature is supported in latest v5.2 release.

Dec 02 '22 14:12 byshiue