vllm icon indicating copy to clipboard operation
vllm copied to clipboard

[Model] LoRA with lm_head fully trained

Open sergeykochetkov opened this issue 1 year ago • 2 comments

FIX #4186 #2816

Support lm_head and embed_tokens fully trained in LoRA.

We found that quality of our adapters significantly drops without fully-trained lm_head or lm_head trained in LoRA style. This is functionality of peft modules_to_save=[lm_head, mebed_tokens] https://huggingface.co/docs/peft/v0.12.0/en/package_reference/#peft.LoraConfig.modules_to_save

The idea is to replace base_model VocabParallelEmbedding and ParallelLMHead by layers loaded from modules_to_save at inferencing LoRA

  • [x] dirty implementation
  • [x] tests for new functionality
  • [ ] checking old functionality is working
  • [x] inference with fully trained lm_head performance measurement

sergeykochetkov avatar Sep 02 '24 11:09 sergeykochetkov

👋 Hi! Thank you for contributing to the vLLM project. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which consists a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of default ones by unblocking the steps in your fast-check build on Buildkite UI.

Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge).

To run full CI, you can do one of these:

  • Comment /ready on the PR
  • Add ready label to the PR
  • Enable auto-merge.

🚀

github-actions[bot] avatar Sep 02 '24 11:09 github-actions[bot]

/ready

sergeykochetkov avatar Sep 11 '24 13:09 sergeykochetkov

should it unmarked as Draft ?

AlongWY avatar Sep 18 '24 14:09 AlongWY

This pull request has merge conflicts that must be resolved before it can be merged. @sergeykochetkov please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify[bot] avatar Oct 30 '24 12:10 mergify[bot]

/ready

sergeykochetkov avatar Nov 01 '24 11:11 sergeykochetkov

should it unmarked as Draft ?

yes, i am waiting for review

sergeykochetkov avatar Nov 01 '24 11:11 sergeykochetkov

This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @sergeykochetkov.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify[bot] avatar Nov 12 '24 19:11 mergify[bot]

Great work! Some CIs show namespace inconsistency for new added symbols. I think it is time to fix and merge after such CIs passed. @youkaichao

AaronZLT avatar Dec 07 '24 10:12 AaronZLT

Just wanted to throw out that this is something I am looking forward to.

I am attempting to use Qwen/Qwen2.5-14B as a base model, and load up two LoRA's with the OpenAI API. One of the LoRA's is just the Instruct model extracted as a LoRA from the base. The other model is a fine tune that I did off of the base, and used MergeKit to do a TIES merge with the base and instruct model, and then extracted an adapter from that merge.

Works great when I was testing with HF transformers, but was surprised when I was getting errors trying to use these adapters with vLLM.

Tostino avatar Jan 02 '25 23:01 Tostino

PR is recreated here

sergeykochetkov avatar Jan 04 '25 08:01 sergeykochetkov

This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you!

github-actions[bot] avatar Apr 05 '25 02:04 github-actions[bot]

This pull request has been automatically closed due to inactivity. Please feel free to reopen if you intend to continue working on it. Thank you!

github-actions[bot] avatar May 05 '25 02:05 github-actions[bot]