LightM-UNet SS2D or CSM

Hi,

many thanks for your great work!

One doubt wrt. SS2D or CSM which was proposed by Vmamba. In your variant of the VSS you are not using the SS2D/CSM. Instead you directly flatten the input and put that into SSM (S6) directly.

Can this approach really capture the spatial 2D information in images?

In the vision mamba paper they also came up with bi-directional SSM to deal with the spatial understanding.

Could you please give a bit insights?

Thanks

Apr 09 '24 18:04 fceex49

I have the same questions.

Jul 18 '24 06:07 DongdongMeng

the blocks they mentioned as vss blocks are similar to basic mamba blocks (S6) recently I read a U-Mamba paper too for image segmentation and their approach was close to this one(they mentioned that they've used the S6 module) I think they minified that model.

p.n The main difference between basic Mamba and VMamba is vss blocks which have SS2D inside them instead of SSM

Aug 02 '24 14:08 shayan1999