ucaslei
Results
3
issues of
ucaslei
In the fine-tuned code, covariates were mentioned, but they were not actually used in the training. Why is this the case? Additionally, various time features were processed during data handling,...
### Describe the Question Please provide a clear and concise description of what the question is. 用大约2B token数据进行13B模型的增量预训练,训练一个epoch,不使用peft,8个a800,预计耗时400小时,远超出理论时间,可能是什么原因,正常情况下一般多久
question
qwen2 pretrain loss非常大,不知道什么问题,10点多,用llama-factory就没有这个问题,float32和bf16都试了,还是无法解决