Medusa The implementation of stage 2 with axolotl

The implementation of stage 2 with axolotl

Open boxiaowave opened this issue 9 months ago • 0 comments

Thanks for the wonderful work.

I am trying to improve the performance with medusa2. But when I start the training of stage 2 based on the model from stage 1, I found the medusa loss is still high. According to the source code of load_model() function, it seems that the medusa heads don't use the weights of base model to initialize. Is it a bug?

May 24 '24 02:05 boxiaowave

Medusa Medusa copied to clipboard

The implementation of stage 2 with axolotl

Medusa
Medusa copied to clipboard