panndas
panndas copied to clipboard
develop attention-only transformers examples
This paper provides a mathematical framework for thinking about attention-only transformers.
If we drop the softmax, this becomes a pretty solid demo for Transformers in panndas -- borrowing the details for the problem from Brandon Rohrer's Transformers tutorial.