Ryan McCormick

Results 15 issues of Ryan McCormick

The link from backend metrics to TRT-LLM batch manager stats is broken, so fixing it on public facing side for user viz.

Bringing this to `main` branch as well since current main pipelines are targeting CUDA 12.5

### Description Adds an OpenAI Compatible Frontend for Triton Inference Server as a FastAPI application using the `tritonserver` in-process python bindings for the following endpoints: - `/v1/models` - `/v1/completions` -...

PR: feat
PR: test

When trying to add a type hint for `response: tritonserver.InferenceResponse`, I noticed I couldn't - so exporting it here at top level similar to other types like `InferenceRequest`.

Add support matrix (and known limitations) around multi-gpu models for vLLM/TRTLLM