equinox
equinox copied to clipboard
Added "process_heads" to MultiheadAttention
While checking this PR #568, I noticed that the "process_heads" part actually shouldn't be part of the RoPE embeddings PR as it's a separate thing. In theory, you could process the heads in any way you want.
Therefore, I thought it'd be best to make the PRs into smaller, more manageable chunks.