llama4 request
llama4 is huge, having the ability to somehow prune it \ separate the experts will help the community a great deal.
+1
+1, besides pruning and separating experts, the possibility of merging existing trained models into the new llama 4 architecture will be interesting too. Imagine having separate 1B models and merge them into llama 4 architecture, this will be very helpful for further experimentations.
I see a commit to add a definition for Llama4, however I get an error message when trying to actually merge
RuntimeError: Tensor vision_model.model.layers.33.attention.v_proj.bias required but not present in model models--meta-llama--Llama-4-Maverick-17B-128E
Linking related PR Add Llama4ForConditionalGeneration by cg123 · Pull Request #552 · arcee-ai/mergekit for easy tracking