mamba
mamba copied to clipboard
feat: Initial state support for Mamba SSM (1)
Add chunked prefill / use initial state capability to Mamba ssm ( Mamba 1 ) , Done it by prepending the last forward pass state to the FWD pass kernel and read the data accordingly .
Latency is not affected. ( benchmark script shows similar latencies between this PR and main - 130ms ) Added tests that check correctness when running on chunks.
Limitations:
- Applied only for selective scan fwd pass ( bwd pass is not supported )
This PR enables efficient Speculative decoding, prefix caching and prefill chunking.
FIX #233 #473 #258 #101