vllm LoRA support on llama4

Essential Elements of an Effective PR Description Checklist

[x] The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
[ ] The test plan, such as providing test command.
[ ] The test results, such as pasting the results comparison before and after, or e2e results
[ ] (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

Current llama4 does not support LoRA. Add this feature support.

Test Plan

Verified on internal llama4 model. Need further verification on large models but I do not have the lora adapter. Will need user who has this request to help

Test Result

(Optional) Documentation Update

Jun 18 '25 20:06 frank-wei

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Jun 18 '25 20:06 github-actions[bot]

Considering that our current MoE layer doesn't support LoRA yet, llama4 may not be able to fully support LoRA

@jeejeelee , we probably need to ask people do not add the adapter for MOE modules. Thanks @22quinn , thanks for the pointer. I will verify this adapter. Looks like the target_modules do not include the MOE modules which are what we can support now.

Jun 19 '25 07:06 frank-wei

Tried the adapter from @22quinn , that adapter has trained with module name not supported. The shared expert module has linear operator with name gate_up_proj and down_proj but the adapter has sth called gate_proj and up_proj which are not aligned.

Log for shared expert: [1;36m(VllmWorker rank=7 pid=481083)[0;0m name= language_model.model.layers.26.feed_forward.shared_expert module LlamaMLP( [1;36m(VllmWorker rank=7 pid=481083)[0;0m (gate_up_proj): MergedColumnParallelLinear(in_features=5120, output_features=2048, bias=False, tp_size=8, gather_output=False) [1;36m(VllmWorker rank=7 pid=481083)[0;0m (down_proj): RowParallelLinear(input_features=1024, output_features=5120, bias=False, tp_size=8, reduce_results=False) [1;36m(VllmWorker rank=7 pid=481083)[0;0m (act_fn): SiluAndMul()

Error: ValueError: While loading /home/wwei6/venv/llama4-adapter/llama4-medqa, expected target modules in ['linear', 'linear_1', 'gate_up_proj', 'v_proj', 'router', 'fc2', 'k_proj', 'o_proj', 'fc1', 'down_proj', 'q_proj'] but received ['language_model.model.layers.0.feed_forward.shared_expert.gate_proj', 'language_model.model.layers.0.feed_forward.shared_expert.up_proj'....

At this stage, I think the functionality is good but need adapter for further checking. I've verified the small model internally but not the full size one yet. I can follow up if any user has their adapters.

Jun 21 '25 22:06 frank-wei

@houseroad , so far, we do not have an appropriate adapter for scout or maverick. Do you prefer landing first or checking in later?

Jun 24 '25 23:06 frank-wei

Let's hold on for a bit.

Jul 14 '25 21:07 houseroad

Considering that our current MoE layer doesn't support LoRA yet, llama4 may not be able to fully support LoRA

@jeejeelee , we probably need to ask people do not add the adapter for MOE modules. Thanks @22quinn , thanks for the pointer. I will verify this adapter. Looks like the target_modules do not include the MOE modules which are what we can support now.

Currently we might only be able to add warnings, just like https://github.com/vllm-project/vllm/pull/20932 did.

Jul 17 '25 03:07 jeejeelee

@jeejeelee since you've added the warning over the MOE module. Anything else you need me to do? Shall we consider merge or hold on.

Sep 23 '25 04:09 frank-wei

vllm vllm copied to clipboard

LoRA support on llama4

Essential Elements of an Effective PR Description Checklist

Purpose

Test Plan

Test Result

(Optional) Documentation Update

vllm
vllm copied to clipboard