DeBERTa icon indicating copy to clipboard operation
DeBERTa copied to clipboard

Pre-training times: v2 vs. v3

Open stefan-it opened this issue 2 years ago • 1 comments

Hi,

it would be very interesting to also see a comparison of pre-training times for DeBERTa v2 versus the recently released v3, that is using RTD.

The v2 paper mentioned pre-training times:

image

But what about v3 base, large and multi-lingual models :thinking:

stefan-it avatar Apr 11 '22 12:04 stefan-it

I was trying to pretrain DeBERTav2 with RTD objective (but without the Gradient-Disentangled Emb. sharing). I noticed that it runs way slower than electra (which is bert based).

I tried doing some quick benchmarking and noticed that deberta is twice as slow as bert for inference image

WissamAntoun avatar Apr 13 '22 13:04 WissamAntoun