mamba icon indicating copy to clipboard operation
mamba copied to clipboard

On the small model, the actual GPU memory usage of Mamba2 is much higher than that of Mamba1.

Open AlwaysFHao opened this issue 7 months ago • 17 comments

The parameters of the Mamba2 model are d_state=32, d_conv=4, expand=2, and head_dim=32 (using "nn. Conv1d" with padding method, without the constraint of d_model/head_dim%8==0). Mamba1 maintains the same parameters except for the absence of head_dim. Although the inference speed of Mamba2 has almost doubled compared to Mamba1, the actual memory usage has increased from 4.82G to 7.55G (in my task). I would like to ask if this is due to the basic computational load of Mamba2's semi separation matrix, which poses a disadvantage in small-scale models? I see in your paper that on larger scale models, the actual memory usage of Mamba2 is lower.

AlwaysFHao avatar Jul 02 '24 09:07 AlwaysFHao