Xiaosong,He

Results 3 issues of Xiaosong,He

I'am trying to do this with my own implementation (not with T5), but the initial loss is pretty high (10+), I am not sure whether this is normal.

### Model Series Qwen2.5 ### What are the models used? Qwen2.5-7b-base ### What is the scenario where the problem happened? sft with huggingface trainer ### Is this a known issue?...