mergekit icon indicating copy to clipboard operation
mergekit copied to clipboard

Support for MoE Phi-2

Open beratcmn opened this issue 1 year ago • 2 comments

Phi-2 MoE

Since Microsoft changed the Phi-2' license to MIT we are able to use the model even for commercial projects. I think it's a great candidate for MoE due to its small size and high quality pre-training data.

I don't know it's possible or not, but I think it would be a great addition.

beratcmn avatar Jan 07 '24 15:01 beratcmn

This is possible if the architecture is able to be converted into Llama/Mistral-format weights (see #82). The way mergekit-moe works depends on being able to use the Mixtral architecture for the output. Beyond that script, this is definitely possible - it would need custom code to inference the model though. I definitely don't have the bandwidth to pursue custom-code MoE architectures at the moment but if #82 pans out this will come for free.

cg123 avatar Jan 09 '24 06:01 cg123

Have you seen Phixtral by @mlabonne ? https://x.com/maximelabonne/status/1744867841436700850

fakerybakery avatar Jan 10 '24 02:01 fakerybakery