Qwen icon indicating copy to clipboard operation
Qwen copied to clipboard

72B模型是预训练阶段就完全用32k窗口的吗?

Open HaoshengZou opened this issue 1 year ago • 6 comments

config.json里的seq_length是否可以完全代表预训练时的窗口长度?72B从头开始用32k窗口训的吗?

HaoshengZou avatar Dec 01 '23 15:12 HaoshengZou

8k训练,可以外推32k

liudayiheng avatar Dec 03 '23 08:12 liudayiheng

感谢回复!

HaoshengZou avatar Dec 03 '23 12:12 HaoshengZou

8k训练,可以外推32k

请问8k训练的时候base就是1000000吗?

boxiaowave avatar Dec 05 '23 16:12 boxiaowave

8k训练,可以外推32k

请问8k训练的时候base就是1000000吗?

Hope this issue could be reopen.

TissueC avatar Dec 10 '23 10:12 TissueC

8k训练,可以外推32k

请问8k训练的时候base就是1000000吗?

Hope this issue could be reopen.

按我的理解应该是,一直用的同一个base。预训练不同阶段改base没有任何必要,反而会让loss突然崩一下。期待官方回复后关闭issue。感谢! @liudayiheng

HaoshengZou avatar Dec 11 '23 06:12 HaoshengZou

其实比较想知道qwen 72B的外推训练方案是否和codellama类似,短文本先长时间pretrain,再修改base到100000进行8k长文本的continue pretrain?

boxiaowave avatar Dec 13 '23 12:12 boxiaowave

Not able to disclose related information. Thank you for your understanding.

jklj077 avatar Jan 02 '24 13:01 jklj077