mergekit
mergekit copied to clipboard
Support for MoE Phi-2
Phi-2 MoE
Since Microsoft changed the Phi-2' license to MIT we are able to use the model even for commercial projects. I think it's a great candidate for MoE due to its small size and high quality pre-training data.
I don't know it's possible or not, but I think it would be a great addition.
This is possible if the architecture is able to be converted into Llama/Mistral-format weights (see #82). The way mergekit-moe
works depends on being able to use the Mixtral architecture for the output. Beyond that script, this is definitely possible - it would need custom code to inference the model though. I definitely don't have the bandwidth to pursue custom-code MoE architectures at the moment but if #82 pans out this will come for free.
Have you seen Phixtral by @mlabonne ? https://x.com/maximelabonne/status/1744867841436700850