David Zier
David Zier
Unfortunately, the engineering team can only provide support directly through this forum. If you need more immediate or direct support, I recommend looking into our [NVIDIA AI Enterprise software suite](https://www.nvidia.com/en-us/data-center/products/ai-enterprise/)....
From what I understand, you will need another dimension just for batching. The error lists exactly what it is expecting. Have you tried: ``` name: "tokenizer" max_batch_size: 8 backend: "python"...
The first -1 is represents a dynamic, non-specific size. This is required for dynamic batching, regardless of max batch size. For example, if you have dynamic batching enabled and max...
Unfortunately, the TorchScript format for the libtorch in pytorch backend doesn't carry sufficient metadata to learn what are inputs/outputs for the model. This means that Triton cannot support auto-complete configs...
PyTorch 2.0 support in the PyTorch backend has not been experimental for a long time and fully supports [Torch.Compile](https://docs.pytorch.org/docs/stable/generated/torch.compile.html). The documentation will be updated in the the 25.10 release of...
It should be possible, but we don't have an exact guide for this specific model and GPU skew. There are several examples of how to deploy DeepSeek, LLama2, StableDiffusion, and...
It sound like you should deploy with the [POLL model control mode policy](https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/user_guide/model_management.html#modifying-the-model-repository). Combine that with use of [Model Versions](https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/user_guide/model_repository.html#model-versions) will allow Triton to automatically unload and load new models...
This indeed looks like a valid bug. I will assign a Triton engineer to investigate further.
Which backend or framework are you using? Do you see this same memory behavior when running the model without Triton?
We are looking at ways to make the batcher more modular in the future, so that you can create your own custom batcher. For now, I will add this as...