Yushi Bai

Results 102 comments of Yushi Bai

Please refer to the issue here: https://github.com/THUDM/LongWriter/issues/23

Hey, thanks for your submission! Our author team have already evaluated Gemini-2.0-Flash-Exp and the results are released on https://longbench2.github.io/. We will validate your evaluation results on Qwen2.5-14B and update the...

Hi, for reasoning models such as OpenAI o1 and DeepSeek R1, the w/ CoT setting is not necessary, as these models automatically output their thinking process whether prompted or not....

你好,我这边用H800是可以在24g显存之内完成1w字生成的

请试试在每次生成后加一下这行代码,释放未使用的显存: ```python torch.cuda.empty_cache() ```

Currently our LongWriter-6k training data mainly contains "Short Input+Long Output" type. The model may not generalize well to "Long Input+Long Output" type of task as your demand.

We are working on "Long Input+Long Output" tasks, such as grounded long-form generation. Stay tuned!

我们不打算开源当前的step_parallel方法。这种方法虽然支持并行,但是调用模型输出后面部分时由于看不到前面部分的内容,会让最终输出的长文本的整体连贯性大打折扣。

我们目前提供的`GLM-4-9B`模型训练代码需要`transformers==4.33.0`的环境,更高的transformers环境可能导致错误。为了支持packing training,请用`patch/`下提供的[modeling_chatglm.py](https://github.com/THUDM/LongWriter/blob/main/train/patch/modeling_chatglm.py)文件替换原始模型的`modeling_chatglm.py`.