mamba icon indicating copy to clipboard operation
mamba copied to clipboard

Results 311 mamba issues
Sort by recently updated
recently updated
newest added

In line 72, there is ` dt = tl.minimum(tl.maximum(dt, dt_min), dt_max)` Why this is needed?

I didn't quite understand the connection between the SSD framework and the Mamba 2 architecture. Reading the article, I got the impression that the SSM block in Mamba 2, differently...

I can reach the accuracy of 0.93 just after changing the base model into google-bert/bert-base-uncased model. I download the model(state-spaces/mamba-370m-hf) from https://huggingface.co/state-spaces/mamba-370m-hf. And here's my code: # Training a Classification...

![image](https://github.com/state-spaces/mamba/assets/98384255/faf8a069-173c-4787-a2f1-65bb047e04f1) 这该如何解决啊

At L288 in `selective_scan_bwd_kernel.cuh` code , `a` is defined as follows : https://github.com/state-spaces/mamba/blob/3b0dde5a20659073af5684e966a81981e614789e/csrc/selective_scan/selective_scan_bwd_kernel.cuh#L288 1) What does `a` store, and why is it defined in this manner? 2) Where is hidden...

```bash python benchmarks/benchmark_generation_mamba_simple.py --model-name "AntonV/mamba2-130m-hf" --batch 1 --genlen 4096 --promptlen 600 ``` Output: Loading model AntonV/mamba2-130m-hf Number of parameters: 128989632 Prompt length: 600, generation length: 4096 AntonV/mamba2-130m-hf prompt processing +...

i have set up the environmet succesfully, but when i run `lm_eval --model mamba_ssm --model_args pretrained=state-spaces/mamba-130m --tasks lambada_openai,hellaswag,piqa,arc_easy,arc_challenge,winogrande,openbookqa --device cuda --batch_size 256` i got error as > 2024-09-26:16:41:09,923 INFO [__main__.py:251]...

Hi, I was using complex dynamics for an application, and was seeing large differences in gradients computed by `mamba_inner_ref` and `mamba_inner_fn`, the scan functions worked fine and performed much better...

1. I have a tensor like BLD input to Mamba, the output is also BLD. This is for training. I wanna to know every new Batch is given, does the...

When using Mamba1, I found that even after fixing all random seeds, the experiments were still not reproducible. Does Mamba include any non-deterministic atomic operations?