Daize Dong
Daize Dong
See if this version can be merged?
> any update on this PR? Merged the latest branch and resolved conflicts.
Seems most benchmarks stay relatively stable, but `ifeval` regresses a lot. Is this reasonable?
Thank your for your attention to our project! That is a very good question! Unfortunately, according to our observation, the converted model w/o further finetuning is kind of "broken", i.e.,...
@pprp Sorry that we didn't conduct experiments on ablating the initialization method of the gate weights. However, this method can lead to better balancedness at the initial, and I believe...
@pprp Your images show that both models suffer great performance loss after initialization, and this observation aligns with ours. I think you need to train the models with more tokens...
> [gpt_paper_assistant/configs/config.ini](https://github.com/tatsu-lab/gpt_paper_assistant/blob/5fbf2459ef6b95ea0da0baf5ec6a4083f15bcc5d/configs/config.ini#L23) > > Line 23 in [5fbf245](/tatsu-lab/gpt_paper_assistant/commit/5fbf2459ef6b95ea0da0baf5ec6a4083f15bcc5d) > > author_match = true > > This parameter seems to not be used. I couldn't find a way to turn off...