Daize Dong

Results 8 comments of Daize Dong

> any update on this PR? Merged the latest branch and resolved conflicts.

Seems most benchmarks stay relatively stable, but `ifeval` regresses a lot. Is this reasonable?

Thank your for your attention to our project! That is a very good question! Unfortunately, according to our observation, the converted model w/o further finetuning is kind of "broken", i.e.,...

@pprp Sorry that we didn't conduct experiments on ablating the initialization method of the gate weights. However, this method can lead to better balancedness at the initial, and I believe...

@pprp Your images show that both models suffer great performance loss after initialization, and this observation aligns with ours. I think you need to train the models with more tokens...

> [gpt_paper_assistant/configs/config.ini](https://github.com/tatsu-lab/gpt_paper_assistant/blob/5fbf2459ef6b95ea0da0baf5ec6a4083f15bcc5d/configs/config.ini#L23) > > Line 23 in [5fbf245](/tatsu-lab/gpt_paper_assistant/commit/5fbf2459ef6b95ea0da0baf5ec6a4083f15bcc5d) > > author_match = true > > This parameter seems to not be used. I couldn't find a way to turn off...