mamba icon indicating copy to clipboard operation
mamba copied to clipboard

Results 311 mamba issues
Sort by recently updated
recently updated
newest added

If the local conv (or "causal conv1d") is intended to shift the tokens by 1, then this should instead be `padding=d_conv` instead of `padding=d_conv - 1`, shouldn't it? (Or can...

Following shows the actual values of the input-dependent $\Delta$ during inference of the 2-layer network during the induction-heads task as described in the paper. I successfully trained the model to...

First of all, thanks for your amazing work on Mamba. This is probably one of the most exciting papers I have read this year! I am opening this issue to...

1. I tried vanilla pytorch training loop using bfloat16, the loss got overflow, https://github.com/mesolitica/malaya/blob/5.1/pretrained-model/mamba/causallm-130m-bf16.ipynb 2. so I tried vanilla pytorch training loop using fp32, the loss is ok, https://github.com/mesolitica/malaya/blob/5.1/pretrained-model/mamba/causallm-130m-fp32.ipynb 3....

hi! thanks for your great models. Since getting it running can be a challenge, I wrote an [Apptainer](https://apptainer.org/) definition file like so, which makes it very to run on HPC...

Hi, can you please share pipeline for the wikitext dataset. I found results with 16.3 for mamba and 18 (vs. 18.6 everywhere else) perplexity for the transformer baseline and can...

I'm able to compile [causal-conv1d](https://github.com/Dao-AILab/causal-conv1d) by adding ``` "-DWIN32_LEAN_AND_MEAN", ``` To the nvcc flags. When compiling mamba, after adding `-DWIN32_LEAN_AND_MEAN` to nvcc flags, I find I need to add ```...

Thanks for the great work! I notice that - [your installation](https://github.com/state-spaces/mamba#installation) states that the lowest CUDA version is 11.6. - mamba [v1.0.1](https://github.com/state-spaces/mamba/releases/tag/v1.0.1) supports CUDA 11.8 and CUDA 12.2. Is it...

In your paper, you mentioned that mamba scan is faster than flashattention2. Does it mean comparing https://github.com/state-spaces/mamba/blob/0131c1e94a46fc9f70bcfc9d57962963bb2f0b9e/mamba_ssm/ops/selective_scan_interface.py#L14 with https://github.com/Dao-AILab/flash-attention/blob/9356a1c0389660d7e231ff3163c1ac17d9e3824a/flash_attn/flash_attn_interface.py#L432 ? The inputs of these two modules are different, is this...

Hi Albert & Tri, Awesome work! Thank you for sharing! I noticed a bug when using bias=True in a Mamba block. For example, using: ``` layer = Mamba( # This...