transformers DeBERTa can't load some parameters

System Info

transformers version: 4.21.1
Platform: Linux-5.4.0-81-generic-x86_64-with-glibc2.31
Python version: 3.9.12
Huggingface_hub version: 0.8.1
PyTorch version (GPU?): 1.11.0+cu113 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?:
Using distributed or parallel set-up in script?:

Who can help?

@LysandreJik

Information

[X] The official example scripts
[X] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[X] My own task or dataset (give details below)

Reproduction

Reproduction

from transformers import pipeline

text = "The capital of France is [MASK]"

mlm_pipeline = pipeline('fill-mask', model='microsoft/deberta-base', tokenizer='microsoft/deberta-base')
print(mlm_pipeline(text))

Warning Message

Some weights of the model checkpoint at microsoft/deberta-base were not used when initializing DebertaForMaskedLM: ['lm_predictions.lm_head.LayerNorm.bias', 'lm_predictions.lm_head.bias', 'lm_predictions.lm_head.dense.bias', 'lm_predictions.lm_head.dense.weight', 'deberta.embeddings.position_embeddings.weight', 'lm_predictions.lm_head.LayerNorm.weight']
- This IS expected if you are initializing DebertaForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DebertaForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DebertaForMaskedLM were not initialized from the model checkpoint at microsoft/deberta-base and are newly initialized: ['cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

Output

The capital of France isumption
The capital of France is�
The capital of France iszag
The capital of France isreply
The capital of France isnerg

Expected behavior

When DeBERTa model load using transformers, It seems that doesn't load the weights needed for the MLM head. (+ positional embedding weights). There are some issues similar to mine.

https://github.com/huggingface/transformers/issues/15216
https://github.com/huggingface/transformers/issues/15673
https://github.com/microsoft/DeBERTa/issues/74

But it doesn't seem to be working out yet.
Can you check it?

Aug 17 '22 02:08 sooftware

#18674 should fix this. Thanks for reporting!

Aug 17 '22 22:08 nbroad1881

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Sep 16 '22 15:09 github-actions[bot]

transformers transformers copied to clipboard

DeBERTa can't load some parameters

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

transformers
transformers copied to clipboard