Mor Zusman

Results 5 issues of Mor Zusman

fixing issue https://github.com/microsoft/DeepSpeed/issues/2760 basically just fixing from_blob strides , before the fix the functions returned key and value cache that was populated by zeros.

**Describe the bug** Running a forward pass with `get_present = True` returns wrong values for key and value tensor. They contain a lot of zeros. **To Reproduce** Steps to reproduce...

bug
inference

Add Jamba support to vLLM, This PR comprises two parts: the Jamba modeling file and the Mamba memory handling. Since Jamba is a hybrid model (which alternates between mamba and...

Following #4115 , Thanks to https://github.com/vllm-project/vllm/pull/4115#discussion_r1670888574 , Noticed that `scheduler.finished_requests_ids` could reset without sending the finished requests ids back to the workers. This PR resets the `finished_requests_ids` only when the...

Add chunked prefill / use initial state capability to Mamba ssm ( Mamba 1 ) , Done it by prepending the last forward pass state to the FWD pass kernel...