mamba icon indicating copy to clipboard operation
mamba copied to clipboard

feat: Initial state support for Mamba SSM (1)

Open mzusman opened this issue 7 months ago • 2 comments

Add chunked prefill / use initial state capability to Mamba ssm ( Mamba 1 ) , Done it by prepending the last forward pass state to the FWD pass kernel and read the data accordingly .

Latency is not affected. ( benchmark script shows similar latencies between this PR and main - 130ms ) Added tests that check correctness when running on chunks.

Limitations:

  • Applied only for selective scan fwd pass ( bwd pass is not supported )

This PR enables efficient Speculative decoding, prefix caching and prefill chunking.

FIX #233 #473 #258 #101

mzusman avatar Jul 24 '24 09:07 mzusman