Travis Johnson issues

Results 8 issues of


                                            Travis Johnson

Improve handling of namespace scoped in-memory resources in the Service Controller

When the controller is running in cluster-scoped mode, it watches all namespaces in the cluster. For each namespace that is enabled for ModelMesh (i.e. has the `modelmesh-enabled: true` label), the...

enhancement

[Bug]: Server fails to boot due to a tensor size mismatch when LoRA is enabled for GPTBigCode

### Your current environment ``` Collecting environment information... PyTorch version: 2.3.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A OS: Ubuntu 22.04.4...

bug

[Bugfix][Core] Support prompt_logprobs returned with speculative decoding

In the current code, requesting prompt_logprobs while using speculative decoding can results in crashes. With an MLPSpeculator, an AssertionError is triggered (https://github.com/vllm-project/vllm/issues/7742). With an LLM-based draft model, the behavior has...

[Bug]: Different behavior with tool-use response parsing with streaming vs non-streaming when using max_tokens

### Your current environment The output of `python collect_env.py` ```text Your output of `python collect_env.py` here ``` ### Model Input Dumps _No response_ ### 🐛 Describe the bug When I...

bug

[Bugfix] Fix IndexError when choosing tool while having a tool parser

When running the server supporting auto-tool use but with a streaming request that specifies the function to call, the `[DONE]` message does not get sent and there is an exception...

[Bug]: IndexError when sending a streaming request with tool use

### Your current environment The output of `python collect_env.py` ```text Your output of `python collect_env.py` here ``` ### Model Input Dumps _No response_ ### 🐛 Describe the bug When I...

bug

[Bugfix]: serialize config instances by value when using --trust-remote-code

It is not currently possible to run vLLM with a model that requires `--trust-remote-code` if the server spans multiple nodes. The server will crash with an error when it attempts...

ready

[Model] Add support for GraniteMoeShared models

Adds support for the `granitemoeshared` model type which is based on `granitemoe` but with the addition of a shared experts layer. A preview model with this architecture can be found...