Travis Johnson
Travis Johnson
When the controller is running in cluster-scoped mode, it watches all namespaces in the cluster. For each namespace that is enabled for ModelMesh (i.e. has the `modelmesh-enabled: true` label), the...
### Your current environment ``` Collecting environment information... PyTorch version: 2.3.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A OS: Ubuntu 22.04.4...
In the current code, requesting prompt_logprobs while using speculative decoding can results in crashes. With an MLPSpeculator, an AssertionError is triggered (https://github.com/vllm-project/vllm/issues/7742). With an LLM-based draft model, the behavior has...
### Your current environment The output of `python collect_env.py` ```text Your output of `python collect_env.py` here ``` ### Model Input Dumps _No response_ ### 🐛 Describe the bug When I...
When running the server supporting auto-tool use but with a streaming request that specifies the function to call, the `[DONE]` message does not get sent and there is an exception...
### Your current environment The output of `python collect_env.py` ```text Your output of `python collect_env.py` here ``` ### Model Input Dumps _No response_ ### 🐛 Describe the bug When I...
It is not currently possible to run vLLM with a model that requires `--trust-remote-code` if the server spans multiple nodes. The server will crash with an error when it attempts...
Adds support for the `granitemoeshared` model type which is based on `granitemoe` but with the addition of a shared experts layer. A preview model with this architecture can be found...