FastEdit
FastEdit copied to clipboard
Error occurs when editing Baichuan-13B
loss 3.28 = 3.28 + 0.0 avg prob of [Rishi Sunak] 0.0498
loss nan = nan + nan avg prob of [Rishi Sunak] nan
loss nan = nan + nan avg prob of [Rishi Sunak] nan
loss nan = nan + nan avg prob of [Rishi Sunak] nan
The gradient of delta weight becomes nan
after the first backward operation.
By using:
with torch.autograd.detect_anomaly():
loss.backward()
We caught a runtime error by the script.
RuntimeError: Function 'MmBackward0' returned nan values in its 0th output.
I suppose that it may be related to the alibi attention masks of Baichuan-13B.
It may be caused by the alibi position encoding of the current implementation of the Baichuan-13B model. The alibi position encoding does not accept the attention mask thus it is incompatible with left-padding. We are trying to fix it through re-implement the Baichuan-13B model.
This problem has been fixed, please replace the model file of Baichuan-13B with the updated version in [1] and rerun the editing script.
[1] https://github.com/hiyouga/LLaMA-Efficient-Tuning/blob/main/tests/modeling_baichuan.py