Yushi Bai comments

Results 102 comments of


                                            Yushi Bai

关于Llama-3.1-8B-Instruct在Longbench v2 测试结果和排行榜有出入的问题

看起来你的测试结果和我们测试结果出入不大，我认为这基本在随机误差之内。请问你的截断方式是什么样的呢？

DPO code

Hi! We refer to section 4.2 of our paper for details of DPO. We use the same codebase as [ChatGLM-RLHF](https://arxiv.org/abs/2404.00934). We currently do not have plan to release the code...

DPO code

Hi, our DPO code is based on Megatron-LM.

packing loss 的归一化问题

这里的`shift_weights`已经经过归一化了。每个sample的weight加起来为1。

Code for evaluation with GPT-3.5?

The GPT-3.5-Turbo-16k model evaluated in our paper has already been deprecated. You can try gpt-3.5-turbo-0125 (16k), or the most recent gpt-4o-mini (128k), according to OpenAI (https://platform.openai.com/docs/models).

Code for evaluation with GPT-3.5?

Right. We didn't provide code for evaluating API models. You can modify the [get_pred()](https://github.com/THUDM/LongBench/blob/main/pred.py#L51) fucntion to do so.

微调训练问题

1. 建议在已经经过长度扩展的base模型上做Long Context Alignment微调（SFT, DPO） 2. 显存占用取决于序列长度，比如我们论文中64k长度开zero3训练需要80G显存 3. 如果你的base模型已经在更长的序列上加训过（长度扩展）则只微调即可，否则需要先做加训

Getting the following error GGML_ASSERT(!grammar->stacks.empty()) failed

Hi, I suspect a misalignment in the chat prompt template, but I'm not sure how client.chat deal with the chat template. Can you provide more details?

针对 Paper中 Long数据集的评测配置

你好，Long (>128k) 只是评测数据的一个subset，代表所有测试数据中长度大于 128k token 的数据集合。在所有数据上的评测我们都是用的`--max_model_len 131072`，对于超过 128k token 的序列作截断。

点击Submit，页面显示错误，控制台显示AttributeError: 'PrelrainedlokenizerFast' object has no attribute 'build_chat_input'

hi, please update to the newest [trans_web_demo.py](https://github.com/THUDM/LongWriter/blob/main/trans_web_demo.py).