vllm
vllm copied to clipboard
Sync huggingface modifications of qwen Moe model
nearly huggingface merged my pr:https://github.com/huggingface/transformers/pull/30552/files
i introduced a new config "mlp_only_layers" to qwen Moe model, i think vllm should keep the same model forward logic as huggingface model definations. so this pr is for sync huggingface modifications of qwen Moe model.
the new config "mlp_only_layers" is to set cut which layers' experts, for fit to limited HBM or other creative scenarios.
- change in Qwen2MoeDecoderLayer is to sync huggingface modifications of the model.
- change in load_weights is when mlp_only_layers is not empty, "self.named_parameters()" do not have all weights(because some experts have been cut), it'll cause exception when doing "param = params_dict[name]".
- it's also a bug in the original version of qwen2_moe.py, when set decoder_sparse_step > 1, experts also been cut, and name is not in params_dict, params_dict[name] will cause expceptions.
- what ever, check the params_dict's key "name" exsist or not before read params_dict[name] is safer.
i'm willing to contribute to great vllm ~~ any reply is welcomed~ thks ^_^
i'll fix the CI warns soon.
this issue can be solved: https://github.com/vllm-project/vllm/issues/4369
and these issues can be solved when using MoE models: https://github.com/vllm-project/vllm/issues/3931 https://github.com/vllm-project/vllm/issues/3563
@WoosukKwon @zhuohan123 could you please arrage someone to review my pr ? thanks~~
amd test said: requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/config.json
Failed to connect to localhost port 8000: Connection refused timeout 600 bash -c 'until curl localhost:8000/v1/models; do sleep 1; done'
it's likely to be a bug of the scanner, i try it again
@WoosukKwon @zhuohan123 now ci passed, could you please give a review?thanks ^_^
@JustinLin610 can you help take a look at this? Thank you! 🙏
@JustinLin610 can you help take a look at this? Thank you! 🙏
thank you for replying!
@JustinLin610 HI ^_^ ,qwen member, could you please give this pr a review? thanks!
@JustinLin610 @yangapku @simonJJJ @logicwong @JianxinMa @hzhwcmhf @fyabc @huybery 各位阿里千问member灯塔级大神,谁能帮忙确认一下这个pr嘛? 欢迎帮忙提供任何观点,多谢啦!!
I think the PR is functionally okay.
thank you for your approvement!!
@simon-mo HI ~ it seems qwen members are all very busy ...
bo is our member. yes, i just noticed that this is merged into transformers. somehow it is not necessary cuz we do not have this setup for our models, but functionally it is ok. i think it is okay to merge it. @eigen2017 @simon-mo
bo is our member. yes, i just noticed that this is merged into transformers. somehow it is not necessary cuz we do not have this setup for our models, but functionally it is ok. i think it is okay to merge it. @eigen2017 @simon-mo
thank you very much!yes it's functional ok and don't change anything if not set.
@simon-mo if anything else needed for merg this pr please tell me,
config.mlp_only_layers should have a default value.