mergekit icon indicating copy to clipboard operation
mergekit copied to clipboard

Idea regarding the new 8x22b Mixtral model and the inverse of 'model stock' method

Open jukofyork opened this issue 10 months ago • 20 comments

I see people are trying to extract the Mistral-22b ancestor from the MoE model by averaging the MLP layers and wondered if the 'model stock' method in Mergekit could be inverted:

  • Use the averaged model as a proxy for the centre found by 'model stock' forward method.
  • Project back from the centre model to try to find the base model.

No idea if it could work and don't expect you to go to a lot of trouble to try this, but if anyone reading is interested or knows those currently trying to get the 22b ancestor; it could be worth investigating.

jukofyork avatar Apr 13 '24 11:04 jukofyork