grover icon indicating copy to clipboard operation
grover copied to clipboard

Is the model structure exactly the same as GPT-2?

Open northfoxz opened this issue 4 years ago • 6 comments

Hi there, great work! I'm trying to port the Grover model into the huggingface/transformers repo Is model structure exactly the same as GPT-2? thanks for your reply!

northfoxz avatar Mar 04 '20 19:03 northfoxz

After reading the code, I find out that there are some structural differences between your implementation and that of Openai's. Specifically the normalization process:

openai's implementation:

def block(x, scope, *, past, hparams):
    with tf.variable_scope(scope):
        nx = x.shape[-1].value
        a, present = attn(norm(x, 'ln_1'), 'attn', nx, past=past, hparams=hparams)
        x = x + a
        m = mlp(norm(x, 'ln_2'), 'mlp', nx*4, hparams=hparams)
        x = x + m
        return x, present

ln_1 of each block

Applying norm to the input before attention

ln_2 of each block

Applying norm to the input before the fully-connected layer

Grover's implementation

def residual_mlp_layer(x_flat, intermediate_size, initializer_range=0.02, hidden_dropout_prob=0.1):
    batch_size_seq_length, hidden_size = get_shape_list(x_flat, expected_rank=2)
    x_norm = layer_norm(x_flat, name='mlp_ln0')

    intermediate_output = tf.layers.dense(
        x_norm,
        intermediate_size,
        activation=gelu,
        kernel_initializer=create_initializer(initializer_range),
        name='intermediate',
    )

    output_for_residual = tf.layers.dense(
        intermediate_output,
        hidden_size,
        name='output',
        kernel_initializer=create_initializer(initializer_range))
    output_for_residual = dropout(output_for_residual, hidden_dropout_prob)

    layer_output = layer_norm(x_flat + output_for_residual, name='mlp_ln1')
    return layer_output

Grover applies 2 normalizations in fully-connected layer

That makes the structure different from the OpenAI's implementation, thus I'm unable to transfer this model to Huggingfaces's repo.

northfoxz avatar Mar 05 '20 15:03 northfoxz

sorry for taking a while to get to this one! I believe it's actually the same, since iirc there's an extra layer normalization somewhere else in the openai code. that said, the layer normalizations might not match up in terms of naming...

rowanz avatar Mar 30 '20 16:03 rowanz

Hi @NorthFoxz . Were you able to determine if the difference is only on naming, or if it is structural? If the difference is on the names maybe a grover model can be converted to make it compatible with Huggingface.

EibrielInv avatar Aug 08 '20 16:08 EibrielInv

@EibrielInv well it is structural with slight difference, you will have to modify the gpt2 model code a bit to make it work.

northfoxz avatar Aug 09 '20 05:08 northfoxz

@NorthFoxz ~ Did you ever attempt to get it to port across into the huggingface/transformers repo by adjusting the GPT2 code?

RinaldoG avatar Jul 03 '21 05:07 RinaldoG

Have you ever made a progress on this one? @NorthFoxz Only thing I have found is this: https://huggingface.co/gagan3012/distilbert-fakenews-model-grover But nothing else is there?

dsvilarkovic avatar Apr 06 '22 17:04 dsvilarkovic