Issue about Computation-Efficiency

Open Aristo23333 opened this issue 1 year ago • 1 comments

Dear Author, Thanks for your fantastic work Vision Mamba and now I am following it. I have some questions about the computation-efficiency in part 3.5:

I notice that you refer it to the Algorithm 1, so does the equation 8 includes a whole computational efficiency of a Vim block. Including casual-convolution and SSM? I do not understand clearly how it correspond to Algorithm 1. Could you give me a simply prof to help me to understand. I notice in mamba's original code space in github. Albert.Gu provide a 9BLDN analysis about mamba's selective scan operation. Have you noticed this. Does this approximately align with your analysis? Thank you again and look forward for your response! For you convenience, I will attach the issue110 and Albert Gu's answer.

Sep 09 '24 09:09 Aristo23333

1. What Equation (8) in the Vim Paper Represents

Equation (8):

Ω(SSM)=3𝑀(2𝐷)𝑁+𝑀(2𝐷)𝑁2 Ω(SSM)=3M(2D)N+M(2D)N2

Here:

M = sequence length

D = hidden dimension

N = state dimension (fixed, e.g. 16 by default)

This is intended to capture the entire computation complexity of a Vim block’s SSM module (not just the convolution). The complexity has:

A linear term in M:3𝑀(2𝐷)𝑁3M(2D)N (for projections and multiplications with state).

A quadratic term in N:𝑀(2𝐷)𝑁2M(2D)N2, because the internal recurrence requires combining state dimensions.

So yes — Eq. (8) includes the whole computational cost of the selective scan (SSM inside Algorithm 1), including the convolutional preprocessing (B, C, Δt) and the scan step.

2. How This Corresponds to Algorithm 1

In Algorithm 1 (Vim block), the SSM pipeline has:

Linear projections of the input into B, C, Δt (captured in the 3𝑀(2𝐷)𝑁3M(2D)N term).

The selective scan recurrence (captured in the 𝑀(2𝐷)𝑁2M(2D)N2 term).

So Eq. (8) is basically a summarized FLOP estimate of everything after the linear layer projections, matching what Algorithm 1 describes.

3. Relation to Albert Gu’s FLOP Analysis (9BLDN) Albert Gu’s original FLOP formula for Mamba’s selective scan is: 9𝐵𝐿𝐷𝑁9BLDN

B = batch size

L = sequence length

D = hidden dimension

N = state dimension

That breakdown comes from carefully counting associative operations in the scan (2 multiplies + 1 add → 3 FLOPs, repeated ~2L times per head, leading to the 9 factor).

4. How They Align

Albert Gu’s analysis (9BLDN) is a lower-level FLOP count focused just on the scan kernel (selective recurrence).

Vim’s Eq. (8) wraps this into a higher-level block cost by also including the projections (B, C, Δt computation) and expressing it in big-O style, i.e. 𝑂(𝑀𝐷𝑁+𝑀𝐷𝑁2)O(MDN+MDN2).

So: They are consistent. The Vim paper’s equation (8) is a simplified asymptotic expression that aligns with the detailed kernel FLOP analysis (9BLDN).

Think of it this way:

Gu: exact FLOP accounting for the recurrence.

Vim paper: block-wise asymptotic complexity (linear vs quadratic scaling) that includes both projections and scan.

Aug 26 '25 04:08 Yash-xoxo