Haosheng Zou (邹昊晟) issues

Results 5 issues of


                                            Haosheng Zou (邹昊晟)

72B模型是预训练阶段就完全用32k窗口的吗？

config.json里的seq_length是否可以完全代表预训练时的窗口长度？72B从头开始用32k窗口训的吗？

[QA] 200k原版大海捞针效果不对。推理方式是，32k内正常推，32k外dynamic ntk吗？

根据config.json，是32k内正常推，32k外dynamic ntk？大海捞针200k的时候也是这么推理的吗？初步用lightllm实现了“32k内正常推，32k外dynamic ntk”的推理方法，测了internlm2-chat-7b在原版大海捞针200k长度时的几个case，胡乱输出英文。是不是用错了模型或者推理setting有误？scaling_factor要额外设置吗？如何复现公众号里7B-200k的大海捞针结果？

question

How to reproduce Fig.6 (a)(b)? [Question]

### Required prerequisites - [X] I have read the documentation . - [X] I have searched the [Issue Tracker](https://github.com/PKU-Alignment/safe-rlhf/issues) and [Discussions](https://github.com/PKU-Alignment/safe-rlhf/discussions) that this hasn't already been reported. (+1 or comment...

question

DPO code

Any plans on releasing the DPO code, or a brief intro of how you conducted long-context DPO?

add Sequence Parallelism

# What does this PR do? add Sequence Parallelism (#4733 #5024 #5207 #5815 #5841 etc.) direct plug&play use at https://github.com/Qihoo360/360-LLaMA-Factory We have a separate README and chat-group at https://github.com/Qihoo360/360-LLaMA-Factory, only...

pending