Xing Long
Xing Long
Hello~ I recently read your brilliant paper, but confused anout BP problem mentioned in the introduction: `Moreover, this would also hinder the back-propagation for the prediction module, which needs to...
作者您好,在`documents/pretraining/Causal LM for Continual Pre-training.md`里面,有这样一句话`输入时只需要直接将input_ids复制一份为label即可`,麻烦问一下因为在计算loss的时候,label需要左移一位,那么这个操作是在哪一部分被完成的呢,是在trainer里面吗,可是trainer如何知道是causal loss呢
Hello, I am interested in the standard deviation of the activation and would like to know how the variance is calculated. Here are a few methods: 1. Calculate the variance...
 已经将model_path更换为自己下载模型的位置,transformer也已经更新到最新版,这里的报错有点奇怪,因为官方仓库里面也没有这个文件 