Xiaosong,He
Results
3
issues of
Xiaosong,He
I'am trying to do this with my own implementation (not with T5), but the initial loss is pretty high (10+), I am not sure whether this is normal.
### Model Series Qwen2.5 ### What are the models used? Qwen2.5-7b-base ### What is the scenario where the problem happened? sft with huggingface trainer ### Is this a known issue?...