mergekit icon indicating copy to clipboard operation
mergekit copied to clipboard

llama4 request

Open SicariusSicariiStuff opened this issue 9 months ago • 4 comments

llama4 is huge, having the ability to somehow prune it \ separate the experts will help the community a great deal.

SicariusSicariiStuff avatar Apr 06 '25 05:04 SicariusSicariiStuff

+1

yukiarimo avatar Apr 10 '25 23:04 yukiarimo

+1, besides pruning and separating experts, the possibility of merging existing trained models into the new llama 4 architecture will be interesting too. Imagine having separate 1B models and merge them into llama 4 architecture, this will be very helpful for further experimentations.

davzoku avatar Apr 11 '25 01:04 davzoku

I see a commit to add a definition for Llama4, however I get an error message when trying to actually merge

RuntimeError: Tensor vision_model.model.layers.33.attention.v_proj.bias required but not present in model models--meta-llama--Llama-4-Maverick-17B-128E

MrJackSpade avatar Apr 13 '25 00:04 MrJackSpade