vllm
vllm copied to clipboard
Add `pt_load_map_location` to allow loading to cuda
Summary:
Right now pytorch checkpoints are always loaded to cpu device, but some checkpoints can only be loaded in cuda or other devices, this PR adds a pt_load_map_location flag (defaults to cpu) to LoadConfig
Test Plan:
python tests/test_config.py -k test_load_config_pt_load_map_location
python benchmarks/benchmark_latency.py --input-len 256 --output-len 256 --model jerryzh168/phi4-mini-int4wo-hqq --batch-size 1 --pt_load_map_location cuda:0
# note taht the dict has to use double quote since that's what json expects
python benchmarks/benchmark_latency.py --input-len 256 --output-len 256 --model jerryzh168/phi4-mini-int4wo-hqq --batch-size 1 --pt_load_map_location '{"": "cuda"}'
Avg latency: 1.0369751011331876 seconds
10% percentile latency: 1.0276115661486984 seconds
25% percentile latency: 1.03127920627594 seconds
50% percentile latency: 1.0362111562862992 seconds
75% percentile latency: 1.0433001522906125 seconds
90% percentile latency: 1.0466034093871712 seconds
99% percentile latency: 1.0539717579446732 seconds
Reviewers:
Subscribers:
Tasks:
Tags:
👋 Hi! Thank you for contributing to the vLLM project.
💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.
Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.
To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.
🚀
This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @jerryzh168.
https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork
cc @tlrmchlsmth @mgoin can you take a look
Can we just follow PT conversion here?
is adding dict enough? I'm not sure how to do Callable since this arg is passed from command line
This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @jerryzh168.
https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork
Hi @houseroad can you take a look again?
looks like we can't support union of dict and str because dict is parsed before str: https://github.com/vllm-project/vllm/blob/5c9121203cc34e781f1f249b69cb789244e861f0/vllm/engine/arg_utils.py#L343-L347, I'll just use dict then
This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @jerryzh168.
https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork
This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @jerryzh168.
https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork
@houseroad @mgoin can you merge this, we need this for a release
Could you check if the failed test is related or not? Like if fails without PR or not locally?
@houseroad can't repro locally (failed with another error) but I can see it fail in another CI job in a merged PR: https://buildkite.com/vllm/ci/builds/19166#01968d32-bbaf-415b-8260-546a2d512fe1