curated-transformers Support for Encoder-Decoder-style architectures

Support for Encoder-Decoder-style architectures

Open bilelomrani1 opened this issue 2 years ago • 2 comments

I regularly follow the developments on this project, and I must say that I am very interested and pleased with the direction curated-transformers is taking. The code is very understandable and high-quality, it's a pleasure to work with, congratulations!

This is perhaps already in your plans, but just to mention it here, I think a very nice addition to the project would be to have at least one reference implementation of an encoder-decoder style Transformers, such as the T5 architecture. T5 models are very popular for some tasks, especially in the < 1B parameters range which is still very relevant nowadays. Currently we have reference implementations for decoder-style and encoder-style models, but we're missing at least one reference implementation of an encoder-decoder-style architecture, perhaps with a reusable cross-attention block.

Oct 02 '23 21:10 bilelomrani1

Good question. Support for encoder-decoder architectures is definitely planned. The reason that we don't have them yet is that we first focused on encoder-only to cover the standard spaCy pipelines and then decoder-only for common LLMs, but encoder-decoder is something that we want.

Oct 03 '23 13:10 danieldk

That's understandable, thank you for the clarification.

Oct 05 '23 19:10 bilelomrani1

curated-transformers curated-transformers copied to clipboard

Support for Encoder-Decoder-style architectures

curated-transformers
curated-transformers copied to clipboard