LightM-UNet icon indicating copy to clipboard operation
LightM-UNet copied to clipboard

SS2D or CSM

Open fceex49 opened this issue 2 years ago • 2 comments

Hi,

many thanks for your great work!

One doubt wrt. SS2D or CSM which was proposed by Vmamba. In your variant of the VSS you are not using the SS2D/CSM. Instead you directly flatten the input and put that into SSM (S6) directly.

Can this approach really capture the spatial 2D information in images?

In the vision mamba paper they also came up with bi-directional SSM to deal with the spatial understanding.

Could you please give a bit insights?

Thanks

fceex49 avatar Apr 09 '24 18:04 fceex49

I have the same questions.

DongdongMeng avatar Jul 18 '24 06:07 DongdongMeng

the blocks they mentioned as vss blocks are similar to basic mamba blocks (S6) recently I read a U-Mamba paper too for image segmentation and their approach was close to this one(they mentioned that they've used the S6 module) I think they minified that model.

p.n The main difference between basic Mamba and VMamba is vss blocks which have SS2D inside them instead of SSM

shayan1999 avatar Aug 02 '24 14:08 shayan1999