Qwen icon indicating copy to clipboard operation
Qwen copied to clipboard

[BUG] pull了最新的14b模型相同的代码微调后loss相对之前的高了不少,请问是做了什么改动吗

Open boundles opened this issue 1 year ago • 3 comments

是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?

  • [X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?

  • [X] 我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

No response

期望行为 | Expected Behavior

No response

复现方法 | Steps To Reproduce

No response

运行环境 | Environment

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):

备注 | Anything else?

No response

boundles avatar Dec 24 '23 02:12 boundles

请问是做了什么改动吗?

14B的话,模型参数是没有更新的,相关代码上有一些模型间统一的变化,主要是速度相关的优化。这部分变动应该不会影响训练结果才对,方便说下loss大概高了多少吗?

jklj077 avatar Dec 25 '23 05:12 jklj077

请问是做了什么改动吗?

14B的话,模型参数是没有更新的,相关代码上有一些模型间统一的变化,主要是速度相关的优化。这部分变动应该不会影响训练结果才对,方便说下loss大概高了多少吗?

之前训练完的loss大概是0.001x,现在试了几次是0.03x

boundles avatar Dec 25 '23 06:12 boundles

能问下用多少配置的服务器跑的,谢谢

Hazards10 avatar Dec 29 '23 02:12 Hazards10

This issue has been automatically marked as inactive due to lack of recent activity. Should you believe it remains unresolved and warrants attention, kindly leave a comment on this thread. 此问题由于长期未有新进展而被系统自动标记为不活跃。如果您认为它仍有待解决,请在此帖下方留言以补充信息。

github-actions[bot] avatar Apr 25 '24 08:04 github-actions[bot]