vllm icon indicating copy to clipboard operation
vllm copied to clipboard

[core] Pass all driver env vars to ray workers unless excluded

Open ruisearch42 opened this issue 9 months ago • 3 comments

Currently many driver env vars are not passed to ray workers while they should have been. This has caused bugs and confusions.

To fix this, we are passing all driver env vars to ray workers unless excluded, based on the following consideration:

  • Many driver env vars are intended to pass through to the workers
  • Even if some of the driver env vars (prefixed with VLLM_) are not used in workers, passing them to workers typically create no harm. Instead of creating a very long inclusion list, using a short exclusion list would be easier to maintain.

We specifically exclude two types of env vars to pass from driver to workers:

  • The ones are specific for workers and need to be configured per worker
  • The ones that are specified in a config file by user

ruisearch42 avatar Mar 02 '25 18:03 ruisearch42

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

github-actions[bot] avatar Mar 02 '25 18:03 github-actions[bot]

Currently many driver env vars are not passed to ray workers while they should have been. This has caused bugs and confusions.

To fix this, we are passing all driver env vars to ray workers unless excluded, based on the following consideration:

  • Many driver env vars are intended to pass through to the workers
  • Even if some of the driver env vars (prefixed with VLLM_) are not used in workers, passing them to workers typically create no harm. Instead of creating a very long inclusion list, using a short exclusion list would be easier to maintain.

@ruisearch42 Great catch. Would you mind sharing the list of driver env vars that may impact cross-node inference? This fix could potentially boost the performance.

My next question would be "do we have an ETA on this?" :-D

xieus avatar Mar 02 '25 19:03 xieus

@ruisearch42 Great catch. Would you mind sharing the list of driver env vars that may impact cross-node inference? This fix could potentially boost the performance.

Hi @xieus , I think all env vars that are used by workers should be passed, some examples: VLLM_USE_FLASHINFER_SAMPLER, VLLM_FLASHINFER_FORCE_TENSOR_CORES, VLLM_PP_LAYER_PARTITION

This will probably be merged soon.

ruisearch42 avatar Mar 02 '25 22:03 ruisearch42