Yushi Bai

Results 102 comments of Yushi Bai

看起来你的测试结果和我们测试结果出入不大,我认为这基本在随机误差之内。请问你的截断方式是什么样的呢?

Hi! We refer to section 4.2 of our paper for details of DPO. We use the same codebase as [ChatGLM-RLHF](https://arxiv.org/abs/2404.00934). We currently do not have plan to release the code...

Hi, our DPO code is based on Megatron-LM.

这里的`shift_weights`已经经过归一化了。每个sample的weight加起来为1。

The GPT-3.5-Turbo-16k model evaluated in our paper has already been deprecated. You can try gpt-3.5-turbo-0125 (16k), or the most recent gpt-4o-mini (128k), according to OpenAI (https://platform.openai.com/docs/models).

Right. We didn't provide code for evaluating API models. You can modify the [get_pred()](https://github.com/THUDM/LongBench/blob/main/pred.py#L51) fucntion to do so.

1. 建议在已经经过长度扩展的base模型上做Long Context Alignment微调(SFT, DPO) 2. 显存占用取决于序列长度,比如我们论文中64k长度开zero3训练需要80G显存 3. 如果你的base模型已经在更长的序列上加训过(长度扩展)则只微调即可,否则需要先做加训

Hi, I suspect a misalignment in the chat prompt template, but I'm not sure how client.chat deal with the chat template. Can you provide more details?

你好,Long (>128k) 只是评测数据的一个subset,代表所有测试数据中长度大于 128k token 的数据集合。在所有数据上的评测我们都是用的`--max_model_len 131072`,对于超过 128k token 的序列作截断。

hi, please update to the newest [trans_web_demo.py](https://github.com/THUDM/LongWriter/blob/main/trans_web_demo.py).