llama4 request

Open SicariusSicariiStuff opened this issue 9 months ago • 4 comments

llama4 is huge, having the ability to somehow prune it \ separate the experts will help the community a great deal.

Apr 06 '25 05:04 SicariusSicariiStuff

Apr 10 '25 23:04 yukiarimo

+1, besides pruning and separating experts, the possibility of merging existing trained models into the new llama 4 architecture will be interesting too. Imagine having separate 1B models and merge them into llama 4 architecture, this will be very helpful for further experimentations.

Apr 11 '25 01:04 davzoku

I see a commit to add a definition for Llama4, however I get an error message when trying to actually merge

RuntimeError: Tensor vision_model.model.layers.33.attention.v_proj.bias required but not present in model models--meta-llama--Llama-4-Maverick-17B-128E

Apr 13 '25 00:04 MrJackSpade

Linking related PR Add Llama4ForConditionalGeneration by cg123 · Pull Request #552 · arcee-ai/mergekit for easy tracking

Apr 13 '25 15:04 davzoku