vllm_backend issues

Add resolve_model_relative_to_config_file config option

11

Currently relative paths to local models are resolved relative to the triton server process. However when deploying models to a central model registry one may not know in advance where...

Legion2

Add input and output tokens to response

7

Like `usage` of OpenAI: https://platform.openai.com/docs/api-reference/chat/object#chat/object-usage ![image](https://github.com/triton-inference-server/vllm_backend/assets/7303612/a1e3c9ca-1d40-40f7-93e8-28d1cabb536a)

kebe7jun

Build: Trigger CI for new vllm_backend Triton releases

1

Please See: this PR should be reviewed and merged after the server's PR: https://github.com/triton-inference-server/server/pull/7500

nvda-mesharma

docs: Update README.md

1

List metrics in `vllm:*` instead of the variable name.

yinggeh

documentation

feat: Report more vllm metrics

7

#### What does the PR do? Report more counter, histogram, gauge metrics from vLLM to Triton metrics server. **Checklist**: - [x] PR title reflects the change and is of format...

Pavloveuge

enhancement

add multimodal support for qwen2.5

7

# Issue Only multi-modal input supported in vllm backend is Llama 3.2 # Contribution - Add support for qwen2.5 multi-modal input - Refactor code to to easily add other multi-modal...

abdulazizab2