vllm icon indicating copy to clipboard operation
vllm copied to clipboard

[Misc] add CLI completion

Open reidliu41 opened this issue 5 months ago • 12 comments

Essential Elements of an Effective PR Description Checklist

  • [ ] The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • [ ] The test plan, such as providing test command.
  • [ ] The test results, such as pasting the results comparison before and after, or e2e results
  • [ ] (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

vllm has somes subcommands, but each with numerous arguments, which makes CLI usage difficult without references. So adds Bash completion support for the vllm CLI.

  • Supports subcommand and option auto-completion (e.g., serve, chat, bench, etc.).
  • Greatly improves usability by helping users discover and use available CLI options more easily.
  • Makes working with long and complex arguments faster and less error-prone.
  • add a script cli_args_completion_generator.py to auto-generate it with the template
$ vllm[double-tabs]
bench        chat         collect-env  complete     run-batch    serve

$ vllm bench[double-tabs]
latency     serve       throughput

$ vllm serve --d[double-tabs]
--data-parallel-address             --disable-fastapi-docs
--data-parallel-backend             --disable-frontend-multiprocessing
--data-parallel-rpc-port            --disable-hybrid-kv-cache-manager
--data-parallel-size                --disable-log-requests
--data-parallel-size-local          --disable-log-stats
--data-parallel-start-rank          --disable-mm-preprocessor-cache
--device                            --disable-sliding-window
--disable-async-output-proc         --disable-uvicorn-access-log
--disable-cascade-attn              --distributed-executor-backend
--disable-chunked-mm-input          --download-dir
--disable-custom-all-reduce         --dtype

Test Plan

Test Result

(Optional) Documentation Update

reidliu41 avatar Jun 16 '25 01:06 reidliu41

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

github-actions[bot] avatar Jun 16 '25 01:06 github-actions[bot]

If there is a way to generate it automatically, it might be easier to maintain.

kebe7jun avatar Jun 16 '25 08:06 kebe7jun

@kebe7jun good point, I didn't make some tests, but simply generate the args for subcommands, it would be ok, but for the whole script, seems need more time to test and check. It would ok for another PR to handle it.

reidliu41 avatar Jun 16 '25 09:06 reidliu41

@DarkLight1337 could you please also help to take a look this if you have time? thanks a lot

reidliu41 avatar Jun 16 '25 13:06 reidliu41

I think the most prevalent form of autocomplete CLI is vllm autocomplete SHELL or alike, and users are expected to enable command autocompletion by source <(vllm autocomplete SHELL) (one can even add this line to .bashrc, .zshrc, ...). This is how autocompletion is delivered for famous cli tools like kubectl or uv, for example. I guess we can do similar thing here by printing the content of vllm-completion.bash in CLI command vllm autocomplete bash, and add similar prints for other shells of interest later?

cjackal avatar Jun 16 '25 14:06 cjackal

My initial approach was to use argparse along with argcomplete, but I found that it tends to execute or load parts of the program during completion, which noticeably slows things down. To ensure a more responsive user experience, I switched to a shell-based completion script instead.

While click does provide built-in support for shell completion and may simplify the implementation, it also comes with certain limitations — especially when managing a large number of arguments, as is the case with vLLM. I'm not entirely confident that click can handle such complex CLI structures efficiently without introducing other trade-offs.

If we do consider switching to click, I believe we should first open a dedicated discussion to clearly compare its pros and cons against argparse, with real examples from vLLM, and make an informed decision.

For now, given the current state and the goal of improving CLI usability through autocompletion, I think the shell-based solution is a practical and lightweight approach.

reidliu41 avatar Jun 16 '25 23:06 reidliu41

If we do consider switching to click, I believe we should first open a dedicated discussion to clearly compare its pros and cons against argparse, with real examples from vLLM, and make an informed decision.

Agreed

For now, given the current state and the goal of improving CLI usability through autocompletion, I think the shell-based solution is a practical and lightweight approach.

I'm concerned about the maintainability of the current approach by having another place that has to be kept in sync with the current set of options. (docs being another)

Have you thought about auto-generating this from the code? It could still be checked into the tree, but there could be a pre-commit hook that validates that it's still up to date when Python code changes.

russellb avatar Jun 17 '25 01:06 russellb

Have you thought about auto-generating this from the code? yeah, I am trying to search a better way and write a scrip to make it auto-generate.

reidliu41 avatar Jun 17 '25 01:06 reidliu41

Given our use case, it's quite common to add new arguments or deprecate old ones (please correct me if I'm mistaken). Therefore, having a script to help keep things up to date can reduce maintenance overhead and ensure consistency.

reidliu41 avatar Jun 17 '25 11:06 reidliu41

@russellb can you help to review ?

reidliu41 avatar Jun 18 '25 12:06 reidliu41

Add a hook to verify that the CLI completion script is up to date.

reidliu41 avatar Jun 19 '25 07:06 reidliu41

hi @DarkLight1337 sorry to bother you again. Do you happen to know if someone else might have time to take a look?

reidliu41 avatar Jun 20 '25 07:06 reidliu41

This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you!

github-actions[bot] avatar Sep 19 '25 02:09 github-actions[bot]

This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @reidliu41.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify[bot] avatar Sep 19 '25 02:09 mergify[bot]

Given the age of this PR and the uncertainty about the maintainability, I think I'm going to close this.

Thank you for making the effort to improve vLLM!

hmellor avatar Sep 19 '25 14:09 hmellor