mamba feat: Initial state support for Mamba SSM (1)

feat: Initial state support for Mamba SSM (1)

Open mzusman opened this issue 7 months ago • 2 comments

Add chunked prefill / use initial state capability to Mamba ssm ( Mamba 1 ) , Done it by prepending the last forward pass state to the FWD pass kernel and read the data accordingly .

Latency is not affected. ( benchmark script shows similar latencies between this PR and main - 130ms ) Added tests that check correctness when running on chunks.

Limitations:

Applied only for selective scan fwd pass ( bwd pass is not supported )

This PR enables efficient Speculative decoding, prefix caching and prefill chunking.

FIX #233 #473 #258 #101

Jul 24 '24 09:07 mzusman

mamba mamba copied to clipboard

feat: Initial state support for Mamba SSM (1)

mamba
mamba copied to clipboard