vllm
vllm copied to clipboard
[Misc] add CLI completion
Essential Elements of an Effective PR Description Checklist
- [ ] The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
- [ ] The test plan, such as providing test command.
- [ ] The test results, such as pasting the results comparison before and after, or e2e results
- [ ] (Optional) The necessary documentation update, such as updating
supported_models.mdandexamplesfor a new model.
Purpose
vllm has somes subcommands, but each with numerous arguments, which makes CLI usage difficult without references. So adds Bash completion support for the vllm CLI.
- Supports subcommand and option auto-completion (e.g., serve, chat, bench, etc.).
- Greatly improves usability by helping users discover and use available CLI options more easily.
- Makes working with long and complex arguments faster and less error-prone.
- add a script
cli_args_completion_generator.pyto auto-generate it with the template
$ vllm[double-tabs]
bench chat collect-env complete run-batch serve
$ vllm bench[double-tabs]
latency serve throughput
$ vllm serve --d[double-tabs]
--data-parallel-address --disable-fastapi-docs
--data-parallel-backend --disable-frontend-multiprocessing
--data-parallel-rpc-port --disable-hybrid-kv-cache-manager
--data-parallel-size --disable-log-requests
--data-parallel-size-local --disable-log-stats
--data-parallel-start-rank --disable-mm-preprocessor-cache
--device --disable-sliding-window
--disable-async-output-proc --disable-uvicorn-access-log
--disable-cascade-attn --distributed-executor-backend
--disable-chunked-mm-input --download-dir
--disable-custom-all-reduce --dtype
Test Plan
Test Result
(Optional) Documentation Update
👋 Hi! Thank you for contributing to the vLLM project.
💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.
Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.
To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.
🚀
If there is a way to generate it automatically, it might be easier to maintain.
@kebe7jun good point, I didn't make some tests, but simply generate the args for subcommands, it would be ok, but for the whole script, seems need more time to test and check. It would ok for another PR to handle it.
@DarkLight1337 could you please also help to take a look this if you have time? thanks a lot
I think the most prevalent form of autocomplete CLI is vllm autocomplete SHELL or alike, and users are expected to enable command autocompletion by source <(vllm autocomplete SHELL) (one can even add this line to .bashrc, .zshrc, ...). This is how autocompletion is delivered for famous cli tools like kubectl or uv, for example. I guess we can do similar thing here by printing the content of vllm-completion.bash in CLI command vllm autocomplete bash, and add similar prints for other shells of interest later?
My initial approach was to use argparse along with argcomplete, but I found that it tends to execute or load parts of the program during completion, which noticeably slows things down. To ensure a more responsive user experience, I switched to a shell-based completion script instead.
While click does provide built-in support for shell completion and may simplify the implementation, it also comes with certain limitations — especially when managing a large number of arguments, as is the case with vLLM. I'm not entirely confident that click can handle such complex CLI structures efficiently without introducing other trade-offs.
If we do consider switching to click, I believe we should first open a dedicated discussion to clearly compare its pros and cons against argparse, with real examples from vLLM, and make an informed decision.
For now, given the current state and the goal of improving CLI usability through autocompletion, I think the shell-based solution is a practical and lightweight approach.
If we do consider switching to click, I believe we should first open a dedicated discussion to clearly compare its pros and cons against argparse, with real examples from vLLM, and make an informed decision.
Agreed
For now, given the current state and the goal of improving CLI usability through autocompletion, I think the shell-based solution is a practical and lightweight approach.
I'm concerned about the maintainability of the current approach by having another place that has to be kept in sync with the current set of options. (docs being another)
Have you thought about auto-generating this from the code? It could still be checked into the tree, but there could be a pre-commit hook that validates that it's still up to date when Python code changes.
Have you thought about auto-generating this from the code?
yeah, I am trying to search a better way and write a scrip to make it auto-generate.
Given our use case, it's quite common to add new arguments or deprecate old ones (please correct me if I'm mistaken). Therefore, having a script to help keep things up to date can reduce maintenance overhead and ensure consistency.
@russellb can you help to review ?
Add a hook to verify that the CLI completion script is up to date.
hi @DarkLight1337 sorry to bother you again. Do you happen to know if someone else might have time to take a look?
This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you!
This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @reidliu41.
https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork
Given the age of this PR and the uncertainty about the maintainability, I think I'm going to close this.
Thank you for making the effort to improve vLLM!