fairseq
fairseq copied to clipboard
Data2Vec: missing lm_head in text model
Hi,
after trying to convert an own pretrained model checkpoint into Transformers library with the following code snippet:
# LM Head
model.lm_head.dense.weight = data2vec_model.encoder.lm_head.dense.weight
model.lm_head.dense.bias = data2vec_model.encoder.lm_head.dense.bias
model.lm_head.layer_norm.weight = data2vec_model.encoder.lm_head.layer_norm.weight
model.lm_head.layer_norm.bias = data2vec_model.encoder.lm_head.layer_norm.bias
model.lm_head.decoder.weight = data2vec_model.encoder.lm_head.weight
model.lm_head.decoder.bias = data2vec_model.encoder.lm_head.bias
it seems that lm_head is missing in the encoder with the current implementation.
Here's a model diff between the original released text model and an own pretrained one:
Official text model:
(lm_head): RobertaLMHead(
(dense): Linear(in_features=768, out_features=768, bias=True)
(layer_norm): FusedLayerNorm(torch.Size([768]), eps=1e-05, elementwise_affine=True)
)
Pretrained one:
# Missing, but there's a regression head?
(regression_head): Sequential(
(0): Linear(in_features=768, out_features=1536, bias=True)
(1): GELU(approximate=False)
(2): Linear(in_features=1536, out_features=768, bias=True)
)
After looking at the Data2VecTextEncoder it seems that build_lm_head is never called:
https://github.com/facebookresearch/fairseq/blob/5307a0e078d7460003a86f4e2246d459d4706a1d/examples/data2vec/models/data2vec_text.py#L280-L323
However, here's the corresponding call in the RobertaEncoder:
https://github.com/facebookresearch/fairseq/blob/5307a0e078d7460003a86f4e2246d459d4706a1d/fairseq/models/roberta/model.py#L555-L564
My questions are now:
- is this a bug and the
lm_headis really missing in the current implementation - what is the main intention/function of the introduced
regression_head(that does not exist in released text model) - if it's not a bug, how can the Transformers conversion be fixed to get working#4534
Many thanks in advance!
Hi @alexeib , sorry for bothering you again, but could you a look at the missing lm_head. I could also share the pretrained checkpoint if necessary :hugs:
Me either have that problem, too.
I am trying to convert own pretrained data2vec-text to Hugginface form, but it looks there are many parameter mismatches on code. (Because of AttributeError, i import the model(Data2vecTextModel) using data2vec-text in fairseq library, not in transformers library as suggested)
It seems many troubleshooting parts are included in 'convert_data2vec_text_original_pytorch_checkpoint_to_pytorch.py'
Anyway, i'd like to know how i transfer 'regression_head' and deal with 'lm_head' like @stefan-it asked