Megatron-DeepSpeed icon indicating copy to clipboard operation
Megatron-DeepSpeed copied to clipboard

DeBERTa-like attention mechanism

Open thomasw21 opened this issue 2 years ago • 0 comments

In this issue, we discuss how viable/interesting it might be to implement DeBERTa like attention mechanism:

https://arxiv.org/abs/2006.03654

Things to take in account:

  • performance enhancements: Check with HF pretrained model to see first?
  • implementation cost: How much would someone need to spend on implementing that feature?
  • implementation feasability: It might not work well with Megatron-DeepSpeed setup, we need to check that.

thomasw21 avatar Aug 05 '21 00:08 thomasw21