Xuekai Zhu comments

Results 13 comments of


                                            Xuekai Zhu

[BUG]: Timeout

Thank you very much, i will try these days. Your advice are very useful!

Could you please share the Training Dataset OR the script for Data Generation?

+1, if there is any update of synthetic data in this paper, please @me !!!!!! Huge thanks!

Could you please share the Training Dataset OR the script for Data Generation?

Thank you very much for your generous sharing. I have submitted my application, and my email is [email protected]. I look forward to seeing your consent.!!!

Break at 1 epoch "Training epoch complete", can't pretraining beyond 1 epoch ?

Yes, i found if i want it to be more than 1 epoch, the config should have max_duration: 2ep. But when i want use max tokens to control the the...

Initial Loss increased from 10 (0.3.0 v) to 60 (0.4.0) !

Not only in BIoMed data. The same results in your provided data. - https://olmo-data.org/preprocessed/olmo-mix/v1_5/gpt-neox-20b-pii-special/part-000-00000.npy

Initial Loss increased from 10 (0.3.0 v) to 60 (0.4.0) !

I use the following command to run OLMo without modifying the source code. Therefore, the default code for loading OLMo ckpt v0.3.0 and v0.4.0 is used. ``` torchrun --nproc_per_node=4 --master_port=29216...

Initial Loss increased from 10 (0.3.0 v) to 60 (0.4.0) !

> @Xuekai-Zhu Can you say more on what you mean by the "presence or absence" of that checkpoint? And can you share the code you're using for loading? you can...

Initial Loss increased from 10 (0.3.0 v) to 60 (0.4.0) !

Loosely speaking, v0.3.0 produces correct loss values, but the loss values in v0.4.0 are incorrect. using the pretrained checkpoint results in even higher loss values, which is clearly an error....

Initial Loss increased from 10 (0.3.0 v) to 60 (0.4.0) !

Thank you very much! I think this might be a rather urgent bug since it leads to training errors. Reverting to the 0.3.0 version work for me now.

[worker, trainer, recipe] feat: add FP16 training and inference support

Yes, I agree. @ISEEKYAN @PeterSH6 I’ve kept the default bf16 dtype unchanged, and added a new example script in the flowrl recipe that enables FP16 through configuration overrides. recipe/flowrl/run_flowrl_qwen2.5_7b_fp16.sh ```...