protonicage issues

Results 7 issues of


                                            protonicage

Batching documentation confusing - can you update the docs of main repository please

### System Info For this case not necessary, but I use the 25.09 ngc tensorrt llm container for triton inference server. ### Who can help? @juney-nvidia @kaiyux ### Information -...

bug

feat: increase performance - only calculate embeddings on tensor channels which are not zero

From my understanding currently embeddings are calculated on the whole batch, even all zero channels, this code addresses that to increase performance of the embedding extraction.

Python backend: Starting multiple GPU instances is currently not feasible/possible with a global config

**Description** Lets say you want to start multiple GPU instances for a python triton model. How do you do it? Short answer: I think it is currently not possible. Example:...

Python backend: Decoupled=True models with dynamic batching can not be batched

**Description** I want to use batching or dynamic batching with a decoupled python model. However the usual approach of iterating over requests and appending tensor to a global list does...

question

Onnx backend nout found

**Description** I get the the error: `"failed to load 'ambernet' version 1: Invalid argument: unable to find backend library for backend 'onnxruntime', try specifying runtime on the model configuration."` **Triton...

Building ONNX backend worked, but needs cuda 11-8 libs

**Is your feature request related to a problem? Please describe.** So I used your script to build my own onnx backend and integrated it successfully into the triton server I...

Delta releases are gone?

So Mozilla launched this new approach to download the common voice dataset, which is fine. However I would like to know, how I can get a delta release now? Why?...