vllm
vllm copied to clipboard
LoRA support on llama4
Essential Elements of an Effective PR Description Checklist
- [x] The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
- [ ] The test plan, such as providing test command.
- [ ] The test results, such as pasting the results comparison before and after, or e2e results
- [ ] (Optional) The necessary documentation update, such as updating
supported_models.mdandexamplesfor a new model.
Purpose
Current llama4 does not support LoRA. Add this feature support.
Test Plan
Verified on internal llama4 model. Need further verification on large models but I do not have the lora adapter. Will need user who has this request to help
Test Result
(Optional) Documentation Update
👋 Hi! Thank you for contributing to the vLLM project.
💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.
Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.
To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.
🚀
Considering that our current MoE layer doesn't support LoRA yet, llama4 may not be able to fully support LoRA
@jeejeelee , we probably need to ask people do not add the adapter for MOE modules. Thanks @22quinn , thanks for the pointer. I will verify this adapter. Looks like the target_modules do not include the MOE modules which are what we can support now.
Tried the adapter from @22quinn , that adapter has trained with module name not supported. The shared expert module has linear operator with name gate_up_proj and down_proj but the adapter has sth called gate_proj and up_proj which are not aligned.
Log for shared expert: [1;36m(VllmWorker rank=7 pid=481083)[0;0m name= language_model.model.layers.26.feed_forward.shared_expert module LlamaMLP( [1;36m(VllmWorker rank=7 pid=481083)[0;0m (gate_up_proj): MergedColumnParallelLinear(in_features=5120, output_features=2048, bias=False, tp_size=8, gather_output=False) [1;36m(VllmWorker rank=7 pid=481083)[0;0m (down_proj): RowParallelLinear(input_features=1024, output_features=5120, bias=False, tp_size=8, reduce_results=False) [1;36m(VllmWorker rank=7 pid=481083)[0;0m (act_fn): SiluAndMul()
Error: ValueError: While loading /home/wwei6/venv/llama4-adapter/llama4-medqa, expected target modules in ['linear', 'linear_1', 'gate_up_proj', 'v_proj', 'router', 'fc2', 'k_proj', 'o_proj', 'fc1', 'down_proj', 'q_proj'] but received ['language_model.model.layers.0.feed_forward.shared_expert.gate_proj', 'language_model.model.layers.0.feed_forward.shared_expert.up_proj'....
At this stage, I think the functionality is good but need adapter for further checking. I've verified the small model internally but not the full size one yet. I can follow up if any user has their adapters.
@houseroad , so far, we do not have an appropriate adapter for scout or maverick. Do you prefer landing first or checking in later?
Let's hold on for a bit.
Considering that our current MoE layer doesn't support LoRA yet, llama4 may not be able to fully support LoRA
@jeejeelee , we probably need to ask people do not add the adapter for MOE modules. Thanks @22quinn , thanks for the pointer. I will verify this adapter. Looks like the target_modules do not include the MOE modules which are what we can support now.
Currently we might only be able to add warnings, just like https://github.com/vllm-project/vllm/pull/20932 did.
@jeejeelee since you've added the warning over the MOE module. Anything else you need me to do? Shall we consider merge or hold on.