vllm_backend icon indicating copy to clipboard operation
vllm_backend copied to clipboard

Results 9 vllm_backend issues
Sort by recently updated
recently updated
newest added

Currently relative paths to local models are resolved relative to the triton server process. However when deploying models to a central model registry one may not know in advance where...

Like `usage` of OpenAI: https://platform.openai.com/docs/api-reference/chat/object#chat/object-usage ![image](https://github.com/triton-inference-server/vllm_backend/assets/7303612/a1e3c9ca-1d40-40f7-93e8-28d1cabb536a)

Please See: this PR should be reviewed and merged after the server's PR: https://github.com/triton-inference-server/server/pull/7500

List metrics in `vllm:*` instead of the variable name.

documentation

#### What does the PR do? Report more counter, histogram, gauge metrics from vLLM to Triton metrics server. **Checklist**: - [x] PR title reflects the change and is of format...

enhancement

# Issue Only multi-modal input supported in vllm backend is Llama 3.2 # Contribution - Add support for qwen2.5 multi-modal input - Refactor code to to easily add other multi-modal...

# Add Priority Request Support for vLLM Async Engine ## Description This PR adds support for priority-based request scheduling in the vLLM async engine. When the engine is configured with...

TODO: [X] Fix non-graceful shut down [ ] re-implement `build_async_engine_client_from_engine_args` for our use-case [ ] implement ProxyStatLogger(VllmStatLoggerBase), which will be attached to a `MQLLMEngine` process and pass metrics updates via...