convert_checkpoint_to_lsg icon indicating copy to clipboard operation
convert_checkpoint_to_lsg copied to clipboard

Convert T5 models to Long T5

Open dafraile opened this issue 2 years ago • 2 comments

Hi, thanks for creating this script, amazing work! I was wondering if you have any plans in creating a convert script for T5 based models, or if you think there are any major difficulties when converting T5 models compared to other architectures.

Thanks,

David

dafraile avatar Oct 31 '22 00:10 dafraile

hi @dafraile

T5 is planned somehow, but there are some caveats:

  • T5 relies on a relative positional embedding. It is added to the attention score matrix directly, so you have to compute both Q @ K.T and a relative positional score matrix which is inefficient for very long sequences. This is not the case for BART/Pegasus models.
  • While relative positional score matrix is not that difficult to compute for local attention, it is not compatible with most LSG sparse attention patterns. There is also no specific rules for global tokens that are prepended.
  • I'd say that LSG-T5 is much more difficult to build because I have to rethink some things specifically for this model.
  • If you really need to use T5 right now, there is the LongT5 model on HuggingFace, it is somehow similar to LSG but it is less efficient. It is retrained from scratch, so it is not based on an existing "short" T5 checkpoint.

ccdv-ai avatar Oct 31 '22 10:10 ccdv-ai

Thank you! That's very enlightening. Yes, I guess for now using the existing LongT5 and retraining on top of it is the only viable option, instead of using already trained T5s and converting them into LSG.

Cheers

dafraile avatar Oct 31 '22 23:10 dafraile