Xing Long

Results 4 issues of Xing Long

Hello~ I recently read your brilliant paper, but confused anout BP problem mentioned in the introduction: `Moreover, this would also hinder the back-propagation for the prediction module, which needs to...

作者您好,在`documents/pretraining/Causal LM for Continual Pre-training.md`里面,有这样一句话`输入时只需要直接将input_ids复制一份为label即可`,麻烦问一下因为在计算loss的时候,label需要左移一位,那么这个操作是在哪一部分被完成的呢,是在trainer里面吗,可是trainer如何知道是causal loss呢

Hello, I am interested in the standard deviation of the activation and would like to know how the variance is calculated. Here are a few methods: 1. Calculate the variance...

![image](https://github.com/user-attachments/assets/3f59a672-b776-4ac5-8b09-d1662d63d7c7) 已经将model_path更换为自己下载模型的位置,transformer也已经更新到最新版,这里的报错有点奇怪,因为官方仓库里面也没有这个文件 ![image](https://github.com/user-attachments/assets/ebb341b7-0272-4adc-9d31-c6ddba3a2024)