Xuekai Zhu
Xuekai Zhu
Thank you very much, i will try these days. Your advice are very useful!
+1, if there is any update of synthetic data in this paper, please @me !!!!!! Huge thanks!
Thank you very much for your generous sharing. I have submitted my application, and my email is [email protected]. I look forward to seeing your consent.!!!
Yes, i found if i want it to be more than 1 epoch, the config should have max_duration: 2ep. But when i want use max tokens to control the the...
Not only in BIoMed data. The same results in your provided data. - https://olmo-data.org/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-000-00000.npy
I use the following command to run OLMo without modifying the source code. Therefore, the default code for loading OLMo ckpt v0.3.0 and v0.4.0 is used. ``` torchrun --nproc_per_node=4 --master_port=29216...
> @Xuekai-Zhu Can you say more on what you mean by the "presence or absence" of that checkpoint? And can you share the code you're using for loading? you can...
Loosely speaking, v0.3.0 produces correct loss values, but the loss values in v0.4.0 are incorrect. using the pretrained checkpoint results in even higher loss values, which is clearly an error....
Thank you very much! I think this might be a rather urgent bug since it leads to training errors. Reverting to the 0.3.0 version work for me now.
Yes, I agree. @ISEEKYAN @PeterSH6 I’ve kept the default bf16 dtype unchanged, and added a new example script in the flowrl recipe that enables FP16 through configuration overrides. recipe/flowrl/run_flowrl_qwen2.5_7b_fp16.sh ```...