composer
composer copied to clipboard
Modular Attention
🚀 Feature Request
A general parametrizable and hookable self-attention function that ALiBi or other future algorithm implementations can call out to, in order to change how attention is done.
Motivation
Right now only ALiBi changes the attention function, but there are many other research papers proposing useful changes to this function.
With the current implementation, each algorithm would replace attention, writing over the other. This means only the last one would have any effect, breaking the combination of multiple algorithms.
[Optional] Implementation
There a few ways to do this, but a simple one might be an Attention class with conditions inside it, or one that retained a set of functions taking a state as an argument, and called them in succession.
Additional context
I spent a few days poking at implementing a different attention algorithm, and I was sad that there seemed no good solution for interoperating with ALiBi.
Hi @xloem, thanks so much for the feature request.
I want to keep this request open because addressing this issue fits into our long-term agenda. In other words, we're aware of this issue as a limitation on composing multiple attention-focused surgery algorithms. But I can't give you an exact timeframe yet.
As you've likely noticed, our transformer surgery algorithms are built to support huggingface models. So, we're limited by the level of modularity in that library. Different model types (e.g., bert and gpt-2) organize their modules differently, and, for a variety of reasons, we can't simply alter these models to impose a consistent, modular design (even just for attention). We're actively considering different ways we can get around this issue in order to create the level of interoperability you've requested.
In the meantime, if you'd be willing to elaborate on the types of use cases you'd want us to support, that would be valuable feedback.
Closing for now as we're tracking elsewhere, but it's low priority. We're open to community PRs!