Medusa
Medusa copied to clipboard
The implementation of stage 2 with axolotl
Thanks for the wonderful work.
I am trying to improve the performance with medusa2. But when I start the training of stage 2 based on the model from stage 1, I found the medusa loss is still high. According to the source code of load_model() function, it seems that the medusa heads don't use the weights of base model to initialize. Is it a bug?