server icon indicating copy to clipboard operation
server copied to clipboard

could you give some examples about ragged input config for tensorrt backend

Open wanghuihhh opened this issue 1 year ago • 5 comments

Is your feature request related to a problem? Please describe. when I use bert model with variable input token length, I don't know how to configure the ragged input to let the model batching. There is not any example about ragged input configure in tensorrt backend.

Describe the solution you'd like For example, the model token input is [B, T, 512], the B is dynamic batch, the T is token length, how we should config the model repository to let the model could inference with batch. We hope the model accept the batch input with [B, max(T), 512] and another input with input length such as [T1, T2, T3].

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

wanghuihhh avatar Jun 11 '24 08:06 wanghuihhh

Hi @wanghuihhh, would you be able to try enabling ragged batching for the input on the config?

kthui avatar Jun 11 '24 22:06 kthui

Hi @wanghuihhh, would you be able to try enabling ragged batching for the input on the config?

I have understand this blog. But I still don't success because the shape of model's input is 3D, but expect of ragged input is 1D. I don't know how to modify my model to adapt the feature. In your introduction, you did not show how to flatten the input with feature dimensions to one dimension. Can you provide a configuration example of a standard BERT model? I believe many people need it.

wanghuihhh avatar Jun 12 '24 07:06 wanghuihhh

Hi @matthewkotila @nv-hwoo, wondering if you have worked with a BERT model in the past? If so, would you be able to share the model config?

kthui avatar Jun 12 '24 19:06 kthui

@kthui: Hi @matthewkotila @nv-hwoo, wondering if you have worked with a BERT model in the past? If so, would you be able to share the model config?

I am not familiar with any BERT models right now :/

matthewkotila avatar Jun 12 '24 22:06 matthewkotila

Thanks for checking @matthewkotila!

Hi @wanghuihhh, have you tried simply using dims: [ -1 ]? Was it not working?

With triton ragged batching, the model will be implemented to expect INPUT shape [ -1 ] and an additional batch input, INDEX, shape [ -1 ] which the model should use to interpret the batch elements in INPUT. For such model, the client requests don't need to be padded and they can be sent as they are (with shapes [ 1, 3 ], [ 1, 4 ], [ 1, 5 ]).

https://github.com/triton-inference-server/server/blob/main/docs/user_guide/ragged_batching.md#example-on-ragged-input-and-batch-input

kthui avatar Jun 28 '24 21:06 kthui

Closing due to lack of activity. Please re-open the issue if you would like to follow up with this issue.

krishung5 avatar Aug 26 '24 22:08 krishung5