David Zier comments

Results 11 comments of


                                            David Zier

Unexpected shape for input 'TEXT' for model 'ensemble_model'. Expected [-1,-1], got [2]

Unfortunately, the engineering team can only provide support directly through this forum. If you need more immediate or direct support, I recommend looking into our [NVIDIA AI Enterprise software suite](https://www.nvidia.com/en-us/data-center/products/ai-enterprise/)....

Unexpected shape for input 'TEXT' for model 'ensemble_model'. Expected [-1,-1], got [2]

From what I understand, you will need another dimension just for batching. The error lists exactly what it is expecting. Have you tried: ``` name: "tokenizer" max_batch_size: 8 backend: "python"...

Unexpected shape for input 'TEXT' for model 'ensemble_model'. Expected [-1,-1], got [2]

The first -1 is represents a dynamic, non-specific size. This is required for dynamic batching, regardless of max batch size. For example, if you have dynamic batching enabled and max...

PyTorch Model Config Autocomplete

Unfortunately, the TorchScript format for the libtorch in pytorch backend doesn't carry sufficient metadata to learn what are inputs/outputs for the model. This means that Triton cannot support auto-complete configs...

Support Pytorch AOTInductor

PyTorch 2.0 support in the PyTorch backend has not been experimental for a long time and fully supports [Torch.Compile](https://docs.pytorch.org/docs/stable/generated/torch.compile.html). The documentation will be updated in the the 25.10 release of...

Feasibility of serving gpt-oss-20B on 4× Tesla T4 GPUs using Triton

It should be possible, but we don't have an exact guide for this specific model and GPU skew. There are several examples of how to deploy DeepSeek, LLama2, StableDiffusion, and...

Triton inference server model service update strategy

It sound like you should deploy with the [POLL model control mode policy](https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/user_guide/model_management.html#modifying-the-model-repository). Combine that with use of [Model Versions](https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/user_guide/model_repository.html#model-versions) will allow Triton to automatically unload and load new models...

David Zier

Unexpected shape for input 'TEXT' for model 'ensemble_model'. Expected [-1,-1], got [2]

Unexpected shape for input 'TEXT' for model 'ensemble_model'. Expected [-1,-1], got [2]

Unexpected shape for input 'TEXT' for model 'ensemble_model'. Expected [-1,-1], got [2]

PyTorch Model Config Autocomplete

Support Pytorch AOTInductor

Feasibility of serving gpt-oss-20B on 4× Tesla T4 GPUs using Triton

Triton inference server model service update strategy

Infinite Reaper Loop with Sequence Batcher.

TritonServer inference service shows steadily increasing RSS memory over requests without reclamation

Add Support for Key-Based Batching