Ryan McCormick issues

Results 15 issues of


                                            Ryan McCormick

Fix batch manager stats link

The link from backend metrics to TRT-LLM batch manager stats is broken, so fixing it on public facing side for user viz.

build: Add WAR for CUDA 12.5 build issue (#257)

Bringing this to `main` branch as well since current main pipelines are targeting CUDA 12.5

feat: OpenAI Compatible Frontend

### Description Adds an OpenAI Compatible Frontend for Triton Inference Server as a FastAPI application using the `tritonserver` in-process python bindings for the following endpoints: - `/v1/models` - `/v1/completions` -...

PR: feat

PR: test

chore: Expose tritonserver.InferenceResponse type

When trying to add a type hint for `response: tritonserver.InferenceResponse`, I noticed I couldn't - so exporting it here at top level similar to other types like `InferenceRequest`.

docs: Add support matrix for model parallelism in OpenAI Frontend

Add support matrix (and known limitations) around multi-gpu models for vLLM/TRTLLM