mamba icon indicating copy to clipboard operation
mamba copied to clipboard

Mamba in Long range arena (LRA)

Open Antidotec opened this issue 1 year ago • 5 comments

May I ask if anyone has tested the effect of mamba in LRA task? I tried to replace the network structure of s4 with the Mixermodel provided by this repository, but the effect was not good on some tasks. Do you have any suggestions for this?

Antidotec avatar Apr 07 '24 06:04 Antidotec

@Antidotec, can you tell me how to construct the inference pipeline for LRA on MAMBA models?

pragyasrivastava0805 avatar May 03 '24 21:05 pragyasrivastava0805

I am doing the same experiment on Pathfinder, and I also find that the model doesn't train...

ngdxzy avatar Jul 06 '24 18:07 ngdxzy

I am doing exactly the same work. I replaced the S4 block in the S4 structure with Mamba's selective SSM. Currently, I am getting very low results on LRA-ListOps and Text, and the interim results for Retrieval (training now) do not seem good.

msjun23 avatar Jul 10 '24 07:07 msjun23

We did not try LRA with Mamba. We don't believe that it's a good dataset, e.g. see: https://openreview.net/forum?id=PdaPky8MUn

With that said, in early versions I quickly tested the Retrieval (AAN) dataset. As discussed in the end of the Mamba paper, we believe it should be good on data such as text and not as good on data such as images (e.g. the Image/Pathfinder tasks). IIRC it performed pretty fine on Retrieval, comparable to S4.

Another approach you can consider is hybrids of different SSMs, e.g. interleaving S4 and Mamba blocks

albertfgu avatar Jul 10 '24 18:07 albertfgu

We did not try LRA with Mamba. We don't believe that it's a good dataset, e.g. see: https://openreview.net/forum?id=PdaPky8MUn

With that said, in early versions I quickly tested the Retrieval (AAN) dataset. As discussed in the end of the Mamba paper, we believe it should be good on data such as text and not as good on data such as images (e.g. the Image/Pathfinder tasks). IIRC it performed pretty fine on Retrieval, comparable to S4.

Another approach you can consider is hybrids of different SSMs, e.g. interleaving S4 and Mamba blocks

Thanks for replying! That is exactly what I observed! I also find that Mamba trains slower on Copying dataset than s4d while s4d fails on Selective Copying.

ngdxzy avatar Jul 10 '24 18:07 ngdxzy