FastEdit icon indicating copy to clipboard operation
FastEdit copied to clipboard

Error occurs when editing Baichuan-13B

Open hiyouga opened this issue 11 months ago ā€¢ 2 comments

loss 3.28 = 3.28 + 0.0 avg prob of [Rishi Sunak] 0.0498
loss nan = nan + nan avg prob of [Rishi Sunak] nan
loss nan = nan + nan avg prob of [Rishi Sunak] nan
loss nan = nan + nan avg prob of [Rishi Sunak] nan

The gradient of delta weight becomes nan after the first backward operation.

By using:

with torch.autograd.detect_anomaly():
     loss.backward()

We caught a runtime error by the script.

RuntimeError: Function 'MmBackward0' returned nan values in its 0th output.

I suppose that it may be related to the alibi attention masks of Baichuan-13B.

hiyouga avatar Jul 11 '23 16:07 hiyouga

It may be caused by the alibi position encoding of the current implementation of the Baichuan-13B model. The alibi position encoding does not accept the attention mask thus it is incompatible with left-padding. We are trying to fix it through re-implement the Baichuan-13B model.

hiyouga avatar Jul 13 '23 13:07 hiyouga

This problem has been fixed, please replace the model file of Baichuan-13B with the updated version in [1] and rerun the editing script.

[1] https://github.com/hiyouga/LLaMA-Efficient-Tuning/blob/main/tests/modeling_baichuan.py

hiyouga avatar Jul 16 '23 10:07 hiyouga