Daniël de Kok
Daniël de Kok
Most of these optimizations should be easy to add: https://pytorch.org/blog/accelerating-generative-ai-2/
Work in progress branch: https://github.com/shadeMe/curated-transformers/tree/staging/feature/deberta-encoder Related to #347.
Add support for the Mistral architecture. Work-in-progress branch: https://github.com/danieldk/curated-transformers/tree/feature/mistral
See https://arxiv.org/abs/2309.17453
In many applications we only need the last layer and letting go of references to intermediate layers can save some memory during inference.
Some models deviate so much from standard transformer encoder/decoders (e.g. DeBERTa and Falcon old architecture) that we probably should not support them in mainline Curated Transformers to avoid cluttering the...
Expose more useful outputs, such as logits through the `Generator` interface. Also fixes #311 .