vllm Add `pt_load_map_location` to allow loading to cuda

Summary: Right now pytorch checkpoints are always loaded to cpu device, but some checkpoints can only be loaded in cuda or other devices, this PR adds a pt_load_map_location flag (defaults to cpu) to LoadConfig

Test Plan:

python tests/test_config.py -k test_load_config_pt_load_map_location

python benchmarks/benchmark_latency.py --input-len 256 --output-len 256 --model jerryzh168/phi4-mini-int4wo-hqq --batch-size 1 --pt_load_map_location cuda:0

# note taht the dict has to use double quote since that's what json expects
python benchmarks/benchmark_latency.py --input-len 256 --output-len 256 --model jerryzh168/phi4-mini-int4wo-hqq --batch-size 1 --pt_load_map_location '{"": "cuda"}'

Avg latency: 1.0369751011331876 seconds
10% percentile latency: 1.0276115661486984 seconds
25% percentile latency: 1.03127920627594 seconds
50% percentile latency: 1.0362111562862992 seconds
75% percentile latency: 1.0433001522906125 seconds
90% percentile latency: 1.0466034093871712 seconds
99% percentile latency: 1.0539717579446732 seconds

Reviewers:

Subscribers:

Tasks:

Tags:

Apr 19 '25 00:04 jerryzh168

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Apr 19 '25 00:04 github-actions[bot]

This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @jerryzh168.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Apr 19 '25 00:04 mergify[bot]

cc @tlrmchlsmth @mgoin can you take a look

Apr 22 '25 00:04 jerryzh168

Can we just follow PT conversion here?

is adding dict enough? I'm not sure how to do Callable since this arg is passed from command line

Apr 22 '25 00:04 jerryzh168

This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @jerryzh168.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Apr 22 '25 18:04 mergify[bot]

Hi @houseroad can you take a look again?

Apr 24 '25 23:04 jerryzh168

looks like we can't support union of dict and str because dict is parsed before str: https://github.com/vllm-project/vllm/blob/5c9121203cc34e781f1f249b69cb789244e861f0/vllm/engine/arg_utils.py#L343-L347, I'll just use dict then

Apr 25 '25 02:04 jerryzh168

This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @jerryzh168.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Apr 29 '25 17:04 mergify[bot]

This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @jerryzh168.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

May 01 '25 17:05 mergify[bot]

@houseroad @mgoin can you merge this, we need this for a release

May 01 '25 22:05 jerryzh168

Could you check if the failed test is related or not? Like if fails without PR or not locally?

May 02 '25 00:05 houseroad

@houseroad can't repro locally (failed with another error) but I can see it fail in another CI job in a merged PR: https://buildkite.com/vllm/ci/builds/19166#01968d32-bbaf-415b-8260-546a2d512fe1

May 02 '25 02:05 jerryzh168

vllm vllm copied to clipboard

Add `pt_load_map_location` to allow loading to cuda

vllm
vllm copied to clipboard