DeepSeek-Coder icon indicating copy to clipboard operation
DeepSeek-Coder copied to clipboard

How to extended window size during train step2?

Open jiejie1993 opened this issue 2 years ago • 2 comments

https://github.com/deepseek-ai/DeepSeek-Coder#model-training descibe "Further Pre-training using an extended 16K window size on an additional 200B tokens", How to extend the window size during training? Just modify max_length and max_position_embeddings in the “config.json” file? or what I need to do?

jiejie1993 avatar Jan 17 '24 09:01 jiejie1993

please check our technical report. https://arxiv.org/pdf/2401.14196.pdf

guoday avatar Jan 30 '24 02:01 guoday

The technical report shows that only 8B data is used to train the long context, while the readme shows that 200B is trained on the 16K window. So I am a little confused here. If the pretrain truncation does have 200B data for 16K window training, then when was the base frequency and scaling factor of the rope modified? Thank you.

Content in the technical report: "The model underwent an additional 1000 steps of training, using a batch size of 512 and a sequence length of 16K. "

@guoday

tangbo-sh avatar Jul 09 '24 06:07 tangbo-sh