Harry Mellor

Results 298 comments of Harry Mellor

Ok, currently working on improving the left sidebar so that the entire API can be navigated properly

I think I need a better solution to the hashing error. If possible, it would be better not to need `ModelConfig` to be hashable at all.

Since this PR has become quite big, I've been splitting it up. You can see the description for the sub-PRs.

Good point, the latest change updates the default in: - `EngineArgs` (including the CLI arg) - `ModelConfig`

Failing tests are likely due to changes in sampling behaviour

Interestingly the "V1 Test" will timeout because `ModelConfig.get_diff_sampling_param()` is called for every request. The slow part of `ModelConfig.get_diff_sampling_param()` is `ModelConfig.try_get_generation_config()`, which reads the default config from disk using `GenerationConfig.from_pretrained`. This...

These conflicts are caused by our migration to `ruff`. Please see https://vllm-dev.slack.com/archives/C07R5Q1Q2BB/p1759663228844749 which contains detailed instructions to make updating your branch as painless as possible.

I'm not the right person to review the speculative decoding part of the PR, I'll leave that to the codeowners (@benchislett @luccafong)

Could you use the contents of `run_cluster.sh` as the contents of your docker service?