jayakommuru

Results 3 issues of jayakommuru

### System Info L4 GPU GPU memory: 24 GB TensorRT LLM version: v0.10.0 container used: tritonserver:24.06-trtllm-python-py3 ### Who can help? @byshiue @schetlur-nv ### Information - [X] The official example scripts...

bug

We are doing benchmarking of triton with different backends, but unable to get the metric the calculate the latency of each request (lets assume each request has batch size of...

We have an encoder based model, and we have currently deployed in FP16 mode in production and we want to reduce the latecny further. Does triton support FP8 ? In...

question