minimal-llama this training process did not consider decoder_attention

this training process did not consider decoder_attention_mask?

Open zlh1992 opened this issue 1 year ago • 0 comments

I see that :

def model_forward(model, inputs): h = inputs h = h.to(model.base_model.model.model.embed_tokens.weight.device) h = model.base_model.model.model.embed_tokens(h) for layer in model.base_model.model.model.layers: h = h.to(layer.input_layernorm.weight.device) h = layer(h)[0] h = h.to(model.base_model.model.model.norm.weight.device) h = model.base_model.model.model.norm(h) h = model.base_model.model.lm_head(h) return h

the output of this model comes from all sequence?

Maybe you need add _prepare_decoder_attention_mask(h) to avoid this...

Mar 24 '23 08:03 zlh1992

minimal-llama minimal-llama copied to clipboard

this training process did not consider decoder_attention_mask?

minimal-llama
minimal-llama copied to clipboard