DeBERTa issues

Question about the hyperparameters of SuperGlue

Hi What are the parameters of Deberta finetune superglue for each task, such as batch, GPU cards, learning rate, etc.? I couldn't find the detailed parameters of each task in...

oyxuan-11

where is the absolute position embeddings?

The paper says you add the absolte position embeddings after all Transformer layers, before softmax layer for MLM, however, I could not find these parameters. looking forward to your response....

ylwangy

is this a bug? in disentangled_attention.py pos_query_layer's dimension is 3, when use p2p attention and this code:\n pos_query = pos_query_layer[:,:,att_span:,:] \n get IndexError: too many indices for tensor of dimension 3

in disentangled_attention.py pos_query_layer's dimension is 3, but when select p2p attention, this code get IndexError ----------------------------- pos_query = pos_query_layer[:,:,att_span:,:] ---------------------------------------- test code: ------------------------------- ########################################## import os os.chdir('F:\\WorkSpace\\DeBERTa-master') import numpy as...

hj-github1256

RuntimeError: Index tensor must have the same number of dimensions as input tensor

5

An error occurred while run in class DisentangledSelfAttention.forward() where query_states.size(1) > hidden_states.size(1): https://github.com/microsoft/DeBERTa/blob/master/DeBERTa/deberta/disentangled_attention.py line 165: p2c_att = torch.gather(p2c_att, dim=-2, index=pos_index.expand(p2c_att.size()[:2] + (pos_index.size(-2), key_layer.size(-2))))

lgstd

can't load v1 model

2

`self.deberta = deberta.DeBERTa(pre_trained='base')` when pre_trained='base','larget','xlarge', throw Traceback (most recent call last): File "/home/v-weishengli/Downloads/pycharm-community-2020.2.2/plugins/python-ce/helpers/pydev/pydevd.py", line 1448, in _exec pydev_imports.execfile(file, globals, locals) # execute the script File "/home/v-weishengli/Downloads/pycharm-community-2020.2.2/plugins/python-ce/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile...

li1117heex

The exact English pretraining data and Chinese pretraining data that are exact same to the BERT paper's pretraining data.

Any one know where to get them? Thank you and thank you.

guotong1988

model key 'encoder.layer.0.attention.self.query_proj.weight' not found in base-mnli

1

code: ```python self.deberta = deberta.DeBERTa(pre_trained="/path/to/pretrained_dir/pytorch_model.bin") self.deberta.apply_state() ``` message: ``` File "/home/user/DeBERTa/DeBERTa/deberta/deberta.py", line 143, in key_match assert len(c)==1, (c, s, key) AssertionError: ([], dict_keys(['deberta.embeddings.word_embeddings.weight', 'deberta.embeddings.LayerNorm.weight', 'deberta.embeddings.LayerNorm.bias', 'deberta.encoder.layer.0.attention.self.q_bias', 'deberta.encoder.layer.0.attention.self.v_bias', 'deberta.encoder.layer.0.attention.self.in_proj.weight', 'deberta.encoder.layer.0.attention.self.pos_proj.weight', 'deberta.encoder.layer.0.attention.self.pos_q_proj.weight',...

cedar33

Pre-training DeBERTa from Scratch

1

Hello there, Are there any instructions on how to pretrain DeBERTa from scratch? Thanks

aabayarea

V2 SentencePiece Tokenizer - training settings used?

Just curious about the switch of tokenizer in V2, can you share why you switched? And what training settings for SentencePiece you used to train the v2 spm tokenizer? Was...

morganmcg1

[bug] incomplete code

1

In `deberta.mlm`, `MaskedLayerNorm ` is not imported from `deberta.ops`, and `PreLayerNorm` is undefined. And I'm not sure if `deberta.mlm` contains codes for pretraining?

shenfe

DeBERTa
DeBERTa copied to clipboard

Metadata

Question about the hyperparameters of SuperGlue

where is the absolute position embeddings?

is this a bug? in disentangled_attention.py pos_query_layer's dimension is 3, when use p2p attention and this code:\n pos_query = pos_query_layer[:,:,att_span:,:] \n get IndexError: too many indices for tensor of dimension 3

RuntimeError: Index tensor must have the same number of dimensions as input tensor

can't load v1 model

The exact English pretraining data and Chinese pretraining data that are exact same to the BERT paper's pretraining data.

model key 'encoder.layer.0.attention.self.query_proj.weight' not found in base-mnli

Pre-training DeBERTa from Scratch

V2 SentencePiece Tokenizer - training settings used?

[bug] incomplete code

← Metadata

Owner

Metadata

DeBERTa DeBERTa copied to clipboard

Metadata

← Metadata

Owner

Metadata

DeBERTa
DeBERTa copied to clipboard