Add support for learnable relative position encoding
Relative postion is useful for text of arbitrary length. Our DeBERTa model now has a relative postional encoding, but it now only returns the repeated embedding matrix: code link
I made a quick implementation based on TF model garden's offering (not fully tested): https://colab.research.google.com/gist/chenmoneygithub/bd44a36f9249a2715b0ccb8b18733f14/learnable-relative-postional-encoding.ipynb
Let's discuss if we want this layer, and we can probably make this issue contribution welcome.
Would this be something we could use from DeBERTa @chenmoneygithub @abheesht17 if we get the right initialization? Or are the weights/graph too different?
If we cannot use this for DeBERTa, what models do use this style relative embedding?
We can use it for DeBERTa if my understanding is correct, Abheesht should have more context.
It's just a general approach (paper), and I am planning to use it for some custom model for text summarization, which cannot use positional embedding because of the input size.