mergekit
mergekit copied to clipboard
Idea regarding the new 8x22b Mixtral model and the inverse of 'model stock' method
I see people are trying to extract the Mistral-22b ancestor from the MoE model by averaging the MLP layers and wondered if the 'model stock' method in Mergekit could be inverted:
- Use the averaged model as a proxy for the centre found by 'model stock' forward method.
- Project back from the centre model to try to find the base model.
No idea if it could work and don't expect you to go to a lot of trouble to try this, but if anyone reading is interested or knows those currently trying to get the 22b ancestor; it could be worth investigating.