server could you give some examples about ragged input config for tensorrt backend

Is your feature request related to a problem? Please describe. when I use bert model with variable input token length, I don't know how to configure the ragged input to let the model batching. There is not any example about ragged input configure in tensorrt backend.

Describe the solution you'd like For example, the model token input is [B, T, 512], the B is dynamic batch, the T is token length, how we should config the model repository to let the model could inference with batch. We hope the model accept the batch input with [B, max(T), 512] and another input with input length such as [T1, T2, T3].

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

Jun 11 '24 08:06 wanghuihhh

Hi @wanghuihhh, would you be able to try enabling ragged batching for the input on the config?

Jun 11 '24 22:06 kthui

Hi @wanghuihhh, would you be able to try enabling ragged batching for the input on the config?

I have understand this blog. But I still don't success because the shape of model's input is 3D, but expect of ragged input is 1D. I don't know how to modify my model to adapt the feature. In your introduction, you did not show how to flatten the input with feature dimensions to one dimension. Can you provide a configuration example of a standard BERT model? I believe many people need it.

Jun 12 '24 07:06 wanghuihhh

Hi @matthewkotila @nv-hwoo, wondering if you have worked with a BERT model in the past? If so, would you be able to share the model config?

Jun 12 '24 19:06 kthui

@kthui: Hi @matthewkotila @nv-hwoo, wondering if you have worked with a BERT model in the past? If so, would you be able to share the model config?

I am not familiar with any BERT models right now :/

Jun 12 '24 22:06 matthewkotila

Thanks for checking @matthewkotila!

Hi @wanghuihhh, have you tried simply using dims: [ -1 ]? Was it not working?

With triton ragged batching, the model will be implemented to expect INPUT shape [ -1 ] and an additional batch input, INDEX, shape [ -1 ] which the model should use to interpret the batch elements in INPUT. For such model, the client requests don't need to be padded and they can be sent as they are (with shapes [ 1, 3 ], [ 1, 4 ], [ 1, 5 ]).

https://github.com/triton-inference-server/server/blob/main/docs/user_guide/ragged_batching.md#example-on-ragged-input-and-batch-input

Jun 28 '24 21:06 kthui

Closing due to lack of activity. Please re-open the issue if you would like to follow up with this issue.

Aug 26 '24 22:08 krishung5

server server copied to clipboard

could you give some examples about ragged input config for tensorrt backend

server
server copied to clipboard