vllm icon indicating copy to clipboard operation
vllm copied to clipboard

Sync huggingface modifications of qwen Moe model

Open eigen2017 opened this issue 1 year ago • 10 comments
trafficstars

nearly huggingface merged my pr:https://github.com/huggingface/transformers/pull/30552/files

i introduced a new config "mlp_only_layers" to qwen Moe model, i think vllm should keep the same model forward logic as huggingface model definations. so this pr is for sync huggingface modifications of qwen Moe model.

the new config "mlp_only_layers" is to set cut which layers' experts, for fit to limited HBM or other creative scenarios.

  1. change in Qwen2MoeDecoderLayer is to sync huggingface modifications of the model.
  2. change in load_weights is when mlp_only_layers is not empty, "self.named_parameters()" do not have all weights(because some experts have been cut), it'll cause exception when doing "param = params_dict[name]".
  3. it's also a bug in the original version of qwen2_moe.py, when set decoder_sparse_step > 1, experts also been cut, and name is not in params_dict, params_dict[name] will cause expceptions.
  4. what ever, check the params_dict's key "name" exsist or not before read params_dict[name] is safer.

i'm willing to contribute to great vllm ~~ any reply is welcomed~ thks ^_^

eigen2017 avatar May 12 '24 17:05 eigen2017

i'll fix the CI warns soon.

eigen2017 avatar May 12 '24 17:05 eigen2017

this issue can be solved: https://github.com/vllm-project/vllm/issues/4369

and these issues can be solved when using MoE models: https://github.com/vllm-project/vllm/issues/3931 https://github.com/vllm-project/vllm/issues/3563

eigen2017 avatar May 13 '24 16:05 eigen2017

@WoosukKwon @zhuohan123 could you please arrage someone to review my pr ? thanks~~

eigen2017 avatar May 13 '24 16:05 eigen2017

amd test said: requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/config.json

Failed to connect to localhost port 8000: Connection refused timeout 600 bash -c 'until curl localhost:8000/v1/models; do sleep 1; done'

it's likely to be a bug of the scanner, i try it again

eigen2017 avatar May 14 '24 13:05 eigen2017

@WoosukKwon @zhuohan123 now ci passed, could you please give a review?thanks ^_^

eigen2017 avatar May 14 '24 15:05 eigen2017

@JustinLin610 can you help take a look at this? Thank you! 🙏

simon-mo avatar May 14 '24 23:05 simon-mo

@JustinLin610 can you help take a look at this? Thank you! 🙏

thank you for replying!

@JustinLin610 HI ^_^ ,qwen member, could you please give this pr a review? thanks!

eigen2017 avatar May 15 '24 07:05 eigen2017

@JustinLin610 @yangapku @simonJJJ @logicwong @JianxinMa @hzhwcmhf @fyabc @huybery 各位阿里千问member灯塔级大神,谁能帮忙确认一下这个pr嘛? 欢迎帮忙提供任何观点,多谢啦!!

eigen2017 avatar May 15 '24 15:05 eigen2017

I think the PR is functionally okay.

thank you for your approvement!!

@simon-mo HI ~ it seems qwen members are all very busy ...

eigen2017 avatar May 16 '24 16:05 eigen2017

bo is our member. yes, i just noticed that this is merged into transformers. somehow it is not necessary cuz we do not have this setup for our models, but functionally it is ok. i think it is okay to merge it. @eigen2017 @simon-mo

JustinLin610 avatar May 17 '24 02:05 JustinLin610

bo is our member. yes, i just noticed that this is merged into transformers. somehow it is not necessary cuz we do not have this setup for our models, but functionally it is ok. i think it is okay to merge it. @eigen2017 @simon-mo

thank you very much!yes it's functional ok and don't change anything if not set.

@simon-mo if anything else needed for merg this pr please tell me,

eigen2017 avatar May 17 '24 11:05 eigen2017

config.mlp_only_layers should have a default value.

78 avatar May 24 '24 22:05 78