DeBERTa Info on Deberta-v2-xlarge training infra

Info on Deberta-v2-xlarge training infra

Open karthickgopalswamy opened this issue 1 year ago • 0 comments

The paper talks about DeBERTa-large, base and DeBERTa1.5B model on V100 GPU. How is the DeBERTa-v2-xlarge trained? is the settings for the xlarge model same as used for large model in the paper? With DeBERTa-v2-xlarge having 900M parameters is any tensor parallelism used for training?

Mar 22 '23 06:03 karthickgopalswamy

DeBERTa DeBERTa copied to clipboard

Info on Deberta-v2-xlarge training infra

DeBERTa
DeBERTa copied to clipboard