equinox icon indicating copy to clipboard operation
equinox copied to clipboard

Lots of improvements to attention

Open patrick-kidger opened this issue 1 year ago • 1 comments

  • Support for autoregressive attention;
    • Includes support for zero-length queries, e.g. when populating the caches for the prompt.
  • Causal masking available by passing mask="causal";
  • Support for multi-query attention.

Still TODO:

  • support biases, not just masks.
  • interpolate between MHA and MQA
  • have KV caching not push elements backwards at the end.
  • ~cast softmax to float32~ [Done elsewhere!]

patrick-kidger avatar May 08 '23 21:05 patrick-kidger