Iman Tabrizian
Iman Tabrizian
@oandreeva-nv I think we did this for Python backend but the feedback was that it makes it harder to determine which part of the test is flaky since they are...
Unfortunately, Triton has it is own serialization/deserialization for TYPE bytes tensors which is likely why you're observing slowdown. Is it possible to use TYPE_UINT8 if you just want to transfer...
@VirginieBfd Can you share the full logs? It appears that the error is coming from your Python model. Also, are you able to reproduce the same error using the NGC...
You can add another output with the same name as the output state if you want to return it to the client. https://github.com/triton-inference-server/server/blob/main/docs/user_guide/architecture.md#implicit-state-management > For debugging purposes, the client can...
Currently, yes that's the only way. Let us know if you have ideas for other ways to extract states as well.
Thanks for proposing a fix @HennerM and filing a detailed a GitHub issue @jamied157. We'll take a look at this and get back to you.
Thanks for reporting this issue. I have filed an internal issue for further investigation.
Unfortunately, we cannot install these libraries as it can increase the container size significantly and there are many other customers asking for different libraries to be included. If we accommodate...
Can you share the max batch size in your model configuration? From model configuration docs: >Input and output shapes are specified by a combination of max_batch_size and the dimensions specified...
Hi @nathanjacobiOXOS I'm sorry about the delay. Triton also supports auto-completing the model configuration for TRT models. Can you try running the model without any model configuration and `--log-verbose=1`. It...