jayakommuru issues

Repositories
Issues
Comments

Results 3 issues of


                                            jayakommuru

T5 model: Encountered an error when fetching new request: Prompt length (200) exceeds maximum input length (1)

### System Info L4 GPU GPU memory: 24 GB TensorRT LLM version: v0.10.0 container used: tritonserver:24.06-trtllm-python-py3 ### Who can help? @byshiue @schetlur-nv ### Information - [X] The official example scripts...

bug

Whats the query to calculate triton model latency per request? Is it nv_inference_request_duration_us / nv_inference_exec_count + nv_inference_queue_duration_us

We are doing benchmarking of triton with different backends, but unable to get the metric the calculate the latency of each request (lets assume each request has batch size of...

are FP8 models supported in Triton ??

We have an encoder based model, and we have currently deployed in FP16 mode in production and we want to reduce the latecny further. Does triton support FP8 ? In...

question