vllm icon indicating copy to clipboard operation
vllm copied to clipboard

Move Gemma's stacked_params_mapping to class scope

Open yhtang opened this issue 5 months ago • 5 comments

This aligns Gemma with the implementations of Gemma2/3, where packed_modules_mapping is defined as a class attribute. This allows us to query the stacking specifications from outside.

yhtang avatar Jun 19 '25 02:06 yhtang

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

github-actions[bot] avatar Jun 19 '25 02:06 github-actions[bot]

Currently, all models have their stacked_params_mapping in load_weights, not just the Gemma model. So I personally think these modifications may not be necessary

jeejeelee avatar Jun 19 '25 02:06 jeejeelee

Currently, all models have their stacked_params_mapping in load_weights, not just the Gemma model. So I personally think these modifications may not be necessary

Ah. I see that there are both stacked_params_mapping in GemmaModel.load_weights and packed_modules_mapping in GemmaForCausalLM. I'll see if what's already there could work. Thanks for the tips!

yhtang avatar Jun 19 '25 03:06 yhtang

That said, it looks like the same piece of information but repeated twice. Would it make sense to combine them into one?

yhtang avatar Jun 19 '25 03:06 yhtang

That said, it looks like the same piece of information but repeated twice. Would it make sense to combine them into one?

I think we probably can't combine them. For example, like baichuan, they don't have a completely corresponding relationship

jeejeelee avatar Jun 19 '25 10:06 jeejeelee