vllm [Frontend] Add -d/--detach option for vllm serve and process management

Add -d/--detach option to vllm serve for background running
Use subcmd process to manage the process

Motivation:

Running multiple models (e.g., chat + embedding) often requires multiple terminals or ports.
On Linux servers, it's common to manage processes via CLI without GUI or multiple windows.
Background mode makes it easier to start services without blocking the terminal.

Features:

Launches vllm serve as a detached subprocess.
Fontend logs are saved to timestamped files (e.g., vllm_2025-05-13_16-45-00.log) automatically.
CLI experience remains unchanged for non-detached usage.


vllm serve --help
INFO 05-13 17:54:03 [__init__.py:248] Automatically detected platform cuda.
usage: vllm serve [model_tag] [options]

Start the vLLM OpenAI Compatible API server.

  -d, --detach          Run the vLLM server in detached mode (background). (default: False)

-t TIMEOUT, --timeout TIMEOUT
                        Timeout (in seconds) to wait for server startup when using --detach.
                        (default: 60)

--log-dir LOG_DIR  Directory to store vLLM log files (only with -d). Fallback to ~/.vllm/logs if
                   no permission. (default: /var/log/vllm)

--pid-dir PID_DIR  Directory to store PID files (only with -d). Fallback to ~/.vllm/pids if no
                   permission. (default: /var/run/vllm)

$vllm serve meta-llama/Meta-Llama-3-8B-Instruct -d
Running detached: vllm serve meta-llama/Meta-Llama-3-8B-Instruct
vLLM server started in detached mode (instance_id: 355c017e, pid: 21967).
Logs: /Users/xx/.vllm_process/355c017e.log

======================================

$ vllm process --help
usage: vllm process [-h] {list,stop,attach,remove} ...

Manage vLLM detached processes.

positional arguments:
  {list,stop,attach,remove}
    list                List all vLLM processes.
    stop                Stop a running vLLM process.
    attach              Attach and view log of a vLLM process.
    remove              Remove a vLLM process record (running process requires --force).

options:
  -h, --help            show this help message and exit

$ vllm process list
Instance_id: cbd6197d | Pid: 21631 | Exited
Time: 2025-06-02 15:24:11
Log: /Users/xx/.vllm_process/cbd6197d.log
Cmd: vllm serve meta-llama/Meta-Llama-3-8B-Instruct

Instance_id: 355c017e | Pid: 21967 | Running
Time: 2025-06-02 15:26:25
Log: /Users/xx/.vllm_process/355c017e.log
Cmd: vllm serve meta-llama/Meta-Llama-3-8B-Instruct

$ vllm process attach -i 355c017e
Attaching to vLLM process (id: 355c017e)
Log file: /Users/xx/.vllm_process/355c017e.log
Press Ctrl+C to detach.

$ vllm process remove -i 355c017e
ERROR: Process pid 21967 is still running!
Please stop the process first, or use -f/--force to force remove.

$ vllm process remove -i 355c017e -f
Process pid 21967 is running. Sending SIGTERM (forced)...
Removed record for vLLM process (id: 355c017e)

May 13 '25 10:05 reidliu41

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

May 13 '25 10:05 github-actions[bot]

This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @reidliu41.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

May 13 '25 17:05 mergify[bot]

@DarkLight1337 could you please help to take a look if you have time? thanks a lot.

May 14 '25 00:05 reidliu41

hi team, please help to take a look if you have time. thanks

May 21 '25 07:05 reidliu41

https://github.com/vllm-project/vllm/issues/17847

Jun 02 '25 08:06 reidliu41

@DarkLight1337 hi Seems the assigned reviewers might be busy. Could someone else from the team take a quick look when they have time? Thanks a lot!

Jun 02 '25 08:06 reidliu41

Maybe @markmc? Not sure who else is really qualified to review

Jun 02 '25 08:06 DarkLight1337

ok, thanks. just wait for a right person..

Jun 02 '25 09:06 reidliu41

This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @reidliu41.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Jun 03 '25 04:06 mergify[bot]

hi @aarnphm, I see you take the ownership of /vllm/entrypoints, could you please take a look if you have time? thanks.

Jun 06 '25 23:06 reidliu41

Hi @reidliu41 thank you for this PR and sorry shout the late review. I have two high level comments

I think we should consider detach the process only after the instance has been started successfully.
we should also follow the general practice in Linux to put log in var log and PID files somewhere.

Jun 14 '25 18:06 simon-mo

hi @simon-mo Thank you so much for your feedback! I’ve added a mechanism to ensure the server has started successfully before completing the detach process. Logs and PID files are now written to standard Linux directories (/var/log, /var/run), with fallbacks if permissions are insufficient. I’m not sure if this handles every edge case yet—please feel free to share any further suggestions or improvements.

Jun 15 '25 10:06 reidliu41

I believe having minimal built-in process tracking can significantly improve the user experience when using --detach — especially in multi-process scenarios.

When a process is detached, users typically continue with other tasks in the terminal or console (especially in production environments). Important information like the PID or log path may scroll past or be lost. Without structured output or tracking, users have to manually rediscover these details every time, which isn’t ideal.

Detached mode may also be used to run multiple models in parallel (e.g., a chat model and an embedding model). In such cases, it's difficult to manage or clean up processes without a central reference for what’s running.

By saving basic info like instance ID, PID, log path, and start time in a local JSON file, we make it easier for users to list, inspect, or stop processes without relying on system tools. This isn't meant to replace ps or kill, but to provide a lightweight, user-friendly layer on top.

Jun 16 '25 23:06 reidliu41

I believe having minimal built-in process tracking can significantly improve the user experience when using --detach — especially in multi-process scenarios.

[...]

Detached mode may also be used to run multiple models in parallel (e.g., a chat model and an embedding model). In such cases, it's difficult to manage or clean up processes without a central reference for what’s running.

I understand this POV, but to me this is equivalent to ps -ef | grep vllm

When a process is detached, users typically continue with other tasks in the terminal or console (especially in production environments). Important information like the PID or log path may scroll past or be lost. Without structured output or tracking, users have to manually rediscover these details every time, which isn’t ideal.

For log path I think we should include it in documentations, saying the default log path, with the naming convention (CLI helper works as well). In this way, I think this would serve best for sysadmin.

While a subcommand is nice, I'm not fully convinced that this outweighs the code we need to maintain. I just don't see that this UX outweighs the burden for maintenance.

By saving basic info like instance ID, PID, log path, and start time in a local JSON file, we make it easier for users to list, inspect, or stop processes without relying on system tools. This isn't meant to replace ps or kill, but to provide a lightweight, user-friendly layer on top.

my assumption for -d is that this is mostly for advanced users (I don't see a point of exposing this in, let say, examples or introduction docs). And my assumption for advanced users is that they should be well-versed with normal linux convention (ps, psgrep, kill, pkill, etc.)

cc @simon-mo on this. I can also be convinced otherwise.

Jun 17 '25 08:06 aarnphm

Thank you for you feedback. may be wait for more input..

Jun 18 '25 21:06 reidliu41

One more perspective is that we typically don't see users running more than one vLLM instance on single GPU/host. Therefore the number of process under management will be typically small.

If the vllm process command can work with non-detached processes, then I think there's some value in it too as it can manage vLLM process globally. But for only detached process, I agree with @aarnphm that there are little benefit given the number of processes will be small.

Jun 30 '25 22:06 simon-mo

Thank you for both. More clearly.

Jun 30 '25 23:06 reidliu41

This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @reidliu41.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Jun 30 '25 23:06 mergify[bot]

updated

Jul 01 '25 02:07 reidliu41

This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @reidliu41.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Jul 03 '25 06:07 mergify[bot]

This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you!

Oct 02 '25 02:10 github-actions[bot]

This pull request has been automatically closed due to inactivity. Please feel free to reopen if you intend to continue working on it. Thank you!

Nov 01 '25 02:11 github-actions[bot]

vllm vllm copied to clipboard

[Frontend] Add -d/--detach option for vllm serve and process management

vllm
vllm copied to clipboard