mamba
mamba copied to clipboard
In line 72, there is ` dt = tl.minimum(tl.maximum(dt, dt_min), dt_max)` Why this is needed?
I didn't quite understand the connection between the SSD framework and the Mamba 2 architecture. Reading the article, I got the impression that the SSM block in Mamba 2, differently...
I can reach the accuracy of 0.93 just after changing the base model into google-bert/bert-base-uncased model. I download the model(state-spaces/mamba-370m-hf) from https://huggingface.co/state-spaces/mamba-370m-hf. And here's my code: # Training a Classification...
data:image/s3,"s3://crabby-images/b96b1/b96b177c8d2097175352fb17c1d7e9a1bf19a89f" alt="image" 这该如何解决啊
At L288 in `selective_scan_bwd_kernel.cuh` code , `a` is defined as follows : https://github.com/state-spaces/mamba/blob/3b0dde5a20659073af5684e966a81981e614789e/csrc/selective_scan/selective_scan_bwd_kernel.cuh#L288 1) What does `a` store, and why is it defined in this manner? 2) Where is hidden...
```bash python benchmarks/benchmark_generation_mamba_simple.py --model-name "AntonV/mamba2-130m-hf" --batch 1 --genlen 4096 --promptlen 600 ``` Output: Loading model AntonV/mamba2-130m-hf Number of parameters: 128989632 Prompt length: 600, generation length: 4096 AntonV/mamba2-130m-hf prompt processing +...
i have set up the environmet succesfully, but when i run `lm_eval --model mamba_ssm --model_args pretrained=state-spaces/mamba-130m --tasks lambada_openai,hellaswag,piqa,arc_easy,arc_challenge,winogrande,openbookqa --device cuda --batch_size 256` i got error as > 2024-09-26:16:41:09,923 INFO [__main__.py:251]...
Hi, I was using complex dynamics for an application, and was seeing large differences in gradients computed by `mamba_inner_ref` and `mamba_inner_fn`, the scan functions worked fine and performed much better...
1. I have a tensor like BLD input to Mamba, the output is also BLD. This is for training. I wanna to know every new Batch is given, does the...
When using Mamba1, I found that even after fixing all random seeds, the experiments were still not reproducible. Does Mamba include any non-deterministic atomic operations?