Piotr Marcinkiewicz comments

Results 16 comments of


                                            Piotr Marcinkiewicz

How to pass priority level during inference?

Thank you for your question. It seems like you're trying to set the priority for inference requests when using a `DynamicBatcher` in Triton Inference Server. From the code you posted,...

How to pass priority level during inference?

I will consider your request as feature proposal.

Example of TensorRT-LLM Whisper backend for PyTriton

Integrating support for Whisper using TensorRT-LLM with PyTriton for speech-to-text is an exciting challenge. However, PyTriton doesn't currently support streaming directly. For streaming audio, we might need a different setup,...

Example of TensorRT-LLM Whisper backend for PyTriton

@yuekaizhang, thank you for your proposal. While I recognize the potential value of integrating your Whisper example with the PyTriton repository, there are some complexities related to the additional dependencies...

Is there a way to run pytriton on glibc2.32?

From your inquiry, it seems you are looking to run PyTriton on an environment with glibc 2.32 to serve your reward model, but are facing difficulties due to version compatibility...

Is there a way to run pytriton on glibc2.32?

Thank you for providing more details on your requirements. Since you need Python 3.10 support, but are constrained by the older glibc version, here's an adjusted approach that might work...

Put `pytriton.client` in the separate package/wheel.

@martin-liu We are considering migrating PyTriton client to [tritonclient](https://github.com/triton-inference-server/client) repository. It will have new API much better aligned with Triton. We have several proposals for new API revamp. **Synchronous interface**...

[Bug] Fail to deploy serving model on the Azure Machine Learning Platform. Exited with failure (confusing error information and exit code)

I appreciate your efforts in troubleshooting the issue with Triton server. To further assist, I recommend starting by ensuring your PyTriton is updated to the latest version (0.5.3). To debug,...

How to infer with sequence ?

Thank you for your interest in PyTriton and stateful models. I have experimented with sequence support in the client, but I realized that PyTriton does not pass sequence information to...

A minimalistic example of using Cuda Shared memory

Let's start with simple Linear model, which takes a single input tensor and returns the negative of the input tensor: ```python import numpy as np from pytriton.decorators import batch @batch...