Hi there, I had just translated the Mamba layer from here to Equinox. Would you accept a PR for this?
PS: To get the most out of Mamba, we'd need to write some Pallas code akin to FlashAttention, but this wouldn't be that. FWIW, on my 3090, both implementations (MHA vs Mamba) were the same speed.