dataless-model-merging How to implement RegMean for GPT-like model?

How to implement RegMean for GPT-like model?

Open A11en0 opened this issue 1 year ago • 0 comments

Is your feature request related to a problem? Please describe. GPT implementation by hugging face is different from T5 and Roberta due to it implementing a self-attention calculator in a parallel way like below:

def __init__():
        self.c_attn = Conv1D(n_state * 3, nx)
        ...

def forward():
        x = self.c_attn(x)
        query, key, value = x.split(self.split_size, dim=2)
        query = self.split_heads(query)
        key = self.split_heads(key, k=True)
        value = self.split_heads(value)
        ...

In this implementation, the RegMean suffers from an issue in the regmean_merge() function, i.e. the line 163 gram_m_ws.append(torch.matmul(param_grams, param transpose(0,1))), the matrix dimensional is not matched. param_grams is [1024, 1024], param is [1024, 1024*3].

Jul 06 '23 08:07 A11en0

dataless-model-merging dataless-model-merging copied to clipboard

How to implement RegMean for GPT-like model?

dataless-model-merging
dataless-model-merging copied to clipboard