server
server copied to clipboard
could you give some examples about ragged input config for tensorrt backend
Is your feature request related to a problem? Please describe. when I use bert model with variable input token length, I don't know how to configure the ragged input to let the model batching. There is not any example about ragged input configure in tensorrt backend.
Describe the solution you'd like For example, the model token input is [B, T, 512], the B is dynamic batch, the T is token length, how we should config the model repository to let the model could inference with batch. We hope the model accept the batch input with [B, max(T), 512] and another input with input length such as [T1, T2, T3].
Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.
Additional context Add any other context or screenshots about the feature request here.
Hi @wanghuihhh, would you be able to try enabling ragged batching for the input on the config?
Hi @wanghuihhh, would you be able to try enabling ragged batching for the input on the config?
I have understand this blog. But I still don't success because the shape of model's input is 3D, but expect of ragged input is 1D. I don't know how to modify my model to adapt the feature. In your introduction, you did not show how to flatten the input with feature dimensions to one dimension. Can you provide a configuration example of a standard BERT model? I believe many people need it.
Hi @matthewkotila @nv-hwoo, wondering if you have worked with a BERT model in the past? If so, would you be able to share the model config?
@kthui: Hi @matthewkotila @nv-hwoo, wondering if you have worked with a BERT model in the past? If so, would you be able to share the model config?
I am not familiar with any BERT models right now :/
Thanks for checking @matthewkotila!
Hi @wanghuihhh, have you tried simply using dims: [ -1 ]? Was it not working?
With triton ragged batching, the model will be implemented to expect INPUT shape [ -1 ] and an additional batch input, INDEX, shape [ -1 ] which the model should use to interpret the batch elements in INPUT. For such model, the client requests don't need to be padded and they can be sent as they are (with shapes [ 1, 3 ], [ 1, 4 ], [ 1, 5 ]).
https://github.com/triton-inference-server/server/blob/main/docs/user_guide/ragged_batching.md#example-on-ragged-input-and-batch-input
Closing due to lack of activity. Please re-open the issue if you would like to follow up with this issue.