FasterTransformer
FasterTransformer copied to clipboard
Run and query a finetuned / custom T5 model in Triton Inference Server
Hi guys! I have two questions, can you help me?
-
It is possible to use triton inference server for a finetuned / custom T5 model with vocabulary with different size from the original (eg MODEL_VOC_SIZE = 32100). Do you have any guide or repository with example?
-
How do I send query to triton server with a finetuned / customT5 model? Do you have any guide or repository with example?
- It is possible. When you convert the model successfully, the model configuration will be saved in a config file and the backend will read it, like here.
- Please refer triton t5 guide.
@byshiue, thanks a lot for the quick response. I will check the mentioned guides in detail.
Close this bug because it is inactivated. Feel free to re-open this issue if you still have any problem.