Baichuan2 icon indicating copy to clipboard operation
Baichuan2 copied to clipboard

AttributeError: 'Parameter' object has no attribute 'ds_status'

Open chenyu012 opened this issue 2 years ago • 6 comments

使用deepspeed + transformers 全量微调时报错 deepspeed 使用的zero3 eval的时候报错;错误详情: AttributeError: 'Parameter' object has no attribute 'ds_status'

chenyu012 avatar Oct 17 '23 06:10 chenyu012

same problem

bestpredicts avatar Oct 17 '23 09:10 bestpredicts

看上去 baichuan2 模型 eval模式下 使用deepspeed stage3 会触发这个问题https://github.com/baichuan-inc/Baichuan2/issues/39#issuecomment-1710146497

bestpredicts avatar Oct 17 '23 09:10 bestpredicts

原因在于以下代码里self.weight = nn.Parameter(nn.functional.normalize(self.weight))把deepspeed stage3在parameter里生成的变量给干掉了。

第一版不做head的normalization就没问题。

class NormHead(nn.Module):
    def __init__(self, hidden_size, vocab_size, bias=False):
        super().__init__()
        self.weight = nn.Parameter(torch.empty((vocab_size, hidden_size)))
        nn.init.kaiming_uniform_(self.weight, a=math.sqrt(5))
        self.first_flag = True

    def forward(self, hidden_states):
        if self.training:
            norm_weight = nn.functional.normalize(self.weight)
        elif self.first_flag:
            self.first_flag = False
            self.weight = nn.Parameter(nn.functional.normalize(self.weight))
            norm_weight = self.weight
        else:
            norm_weight = self.weight
        return nn.functional.linear(hidden_states, norm_weight)

calvinzhan avatar Oct 26 '23 01:10 calvinzhan

我排查到好像是这个原因 IMG14939

luoyujiaye avatar Nov 27 '23 10:11 luoyujiaye

我排查到好像是这个原因 IMG14939

这个能解决吗,chat版本就不报错了吗

handsome-fish avatar Dec 06 '23 04:12 handsome-fish

我排查到好像是这个原因 IMG14939

老哥牛啊,问题解决了

ywb2018 avatar Jan 23 '24 02:01 ywb2018