Travis Johnson

Results 8 issues of Travis Johnson

When the controller is running in cluster-scoped mode, it watches all namespaces in the cluster. For each namespace that is enabled for ModelMesh (i.e. has the `modelmesh-enabled: true` label), the...

enhancement

### Your current environment ``` Collecting environment information... PyTorch version: 2.3.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A OS: Ubuntu 22.04.4...

bug

In the current code, requesting prompt_logprobs while using speculative decoding can results in crashes. With an MLPSpeculator, an AssertionError is triggered (https://github.com/vllm-project/vllm/issues/7742). With an LLM-based draft model, the behavior has...

### Your current environment The output of `python collect_env.py` ```text Your output of `python collect_env.py` here ``` ### Model Input Dumps _No response_ ### 🐛 Describe the bug When I...

bug

When running the server supporting auto-tool use but with a streaming request that specifies the function to call, the `[DONE]` message does not get sent and there is an exception...

### Your current environment The output of `python collect_env.py` ```text Your output of `python collect_env.py` here ``` ### Model Input Dumps _No response_ ### 🐛 Describe the bug When I...

bug

It is not currently possible to run vLLM with a model that requires `--trust-remote-code` if the server spans multiple nodes. The server will crash with an error when it attempts...

ready

Adds support for the `granitemoeshared` model type which is based on `granitemoe` but with the addition of a shared experts layer. A preview model with this architecture can be found...