Qwen [BUG] pull了最新的14b模型相同的代码微调后loss相对之前的高了不少，请问是做了什么改动吗

[BUG] pull了最新的14b模型相同的代码微调后loss相对之前的高了不少，请问是做了什么改动吗

Open boundles opened this issue 1 year ago • 3 comments

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

[X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

[X] 我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

No response

期望行为 | Expected Behavior

No response

复现方法 | Steps To Reproduce

No response

运行环境 | Environment

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):

备注 | Anything else?

No response

Dec 24 '23 02:12 boundles

请问是做了什么改动吗？

14B的话，模型参数是没有更新的，相关代码上有一些模型间统一的变化，主要是速度相关的优化。这部分变动应该不会影响训练结果才对，方便说下loss大概高了多少吗？

Dec 25 '23 05:12 jklj077

请问是做了什么改动吗？

14B的话，模型参数是没有更新的，相关代码上有一些模型间统一的变化，主要是速度相关的优化。这部分变动应该不会影响训练结果才对，方便说下loss大概高了多少吗？

之前训练完的loss大概是0.001x，现在试了几次是0.03x

Dec 25 '23 06:12 boundles

能问下用多少配置的服务器跑的，谢谢

Dec 29 '23 02:12 Hazards10

This issue has been automatically marked as inactive due to lack of recent activity. Should you believe it remains unresolved and warrants attention, kindly leave a comment on this thread. 此问题由于长期未有新进展而被系统自动标记为不活跃。如果您认为它仍有待解决，请在此帖下方留言以补充信息。

Apr 25 '24 08:04 github-actions[bot]

Qwen Qwen copied to clipboard

[BUG] pull了最新的14b模型相同的代码微调后loss相对之前的高了不少，请问是做了什么改动吗

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

当前行为 | Current Behavior

期望行为 | Expected Behavior

复现方法 | Steps To Reproduce

运行环境 | Environment

备注 | Anything else?

Qwen
Qwen copied to clipboard