Yushi Bai comments

Results 102 comments of


                                            Yushi Bai

agentwrite 缺失 instructions.jsonl 文件

Please refer to the issue here: https://github.com/THUDM/LongWriter/issues/23

LongBench v2 Leaderboard Submission Request: Qwen2.5-14B & Gemini2.0 Flash Experimental Results

Hey, thanks for your submission! Our author team have already evaluated Gemini-2.0-Flash-Exp and the results are released on https://longbench2.github.io/. We will validate your evaluation results on Qwen2.5-14B and update the...

Evaluation Setup for Reasoning Model

Hi, for reasoning models such as OpenAI o1 and DeepSeek R1, the w/ CoT setting is not necessary, as these models automatically output their thinking process whether prompted or not....

这个需要多大得显存可以跑起来RTX4090 24G可以吗

你好，我这边用H800是可以在24g显存之内完成1w字生成的

这个需要多大得显存可以跑起来RTX4090 24G可以吗

显存占用大概在20g左右

这个需要多大得显存可以跑起来RTX4090 24G可以吗

请试试在每次生成后加一下这行代码，释放未使用的显存： ```python torch.cuda.empty_cache() ```

Will this work well if I were to give it input sources and ask it to generate an article/report from that?

Currently our LongWriter-6k training data mainly contains "Short Input+Long Output" type. The model may not generalize well to "Long Input+Long Output" type of task as your demand.

Will this work well if I were to give it input sources and ask it to generate an article/report from that?

We are working on "Long Input+Long Output" tasks, such as grounded long-form generation. Stay tuned!

请问step_parallel的prompt和代码有开源吗？

我们不打算开源当前的step_parallel方法。这种方法虽然支持并行，但是调用模型输出后面部分时由于看不到前面部分的内容，会让最终输出的长文本的整体连贯性大打折扣。

[rank4]: Input should be a valid integer, got a number with a fractional part [type=int_from_float, input_value=15099494.4, input_type=float]

我们目前提供的`GLM-4-9B`模型训练代码需要`transformers==4.33.0`的环境，更高的transformers环境可能导致错误。为了支持packing training，请用`patch/`下提供的[modeling_chatglm.py](https://github.com/THUDM/LongWriter/blob/main/train/patch/modeling_chatglm.py)文件替换原始模型的`modeling_chatglm.py`.