mamba issues

[Need more info] `causal_conv1d` doesn't simulate "shifting of tokens by 1"

4

If the local conv (or "causal conv1d") is intended to shift the tokens by 1, then this should instead be `padding=d_conv` instead of `padding=d_conv - 1`, shouldn't it? (Or can...

sudhakarsingh27

Visualization of Delta (post-Softplus) values during the Induction Task

7

Following shows the actual values of the input-dependent $\Delta$ during inference of the 2-layer network during the induction-heads task as described in the paper. I successfully trained the model to...

hrbigelow

Submit implementation to HuggingFace Transformers library

First of all, thanks for your amazing work on Mamba. This is probably one of the most exciting papers I have read this year! I am opening this issue to...

JLTastet

bfloat16 overflow during training session

15

1. I tried vanilla pytorch training loop using bfloat16, the loss got overflow, https://github.com/mesolitica/malaya/blob/5.1/pretrained-model/mamba/causallm-130m-bf16.ipynb 2. so I tried vanilla pytorch training loop using fp32, the loss is ok, https://github.com/mesolitica/malaya/blob/5.1/pretrained-model/mamba/causallm-130m-fp32.ipynb 3....

huseinzol05

Apptainer definition

hi! thanks for your great models. Since getting it running can be a challenge, I wrote an [Apptainer](https://apptainer.org/) definition file like so, which makes it very to run on HPC...

maedoc

Wikitext pipeline

9

Hi, can you please share pipeline for the wikitext dataset. I found results with 16.3 for mamba and 18 (vs. 18.6 everywhere else) perplexity for the transformer baseline and can...

elephantmipt

Windows Support

11

I'm able to compile [causal-conv1d](https://github.com/Dao-AILab/causal-conv1d) by adding ``` "-DWIN32_LEAN_AND_MEAN", ``` To the nvcc flags. When compiling mamba, after adding `-DWIN32_LEAN_AND_MEAN` to nvcc flags, I find I need to add ```...

Phylliida

CUDA version

4

Thanks for the great work! I notice that - [your installation](https://github.com/state-spaces/mamba#installation) states that the lowest CUDA version is 11.6. - mamba [v1.0.1](https://github.com/state-spaces/mamba/releases/tag/v1.0.1) supports CUDA 11.8 and CUDA 12.2. Is it...

ChenLi2049

how to compare mamba with flashattention2

11

In your paper, you mentioned that mamba scan is faster than flashattention2. Does it mean comparing https://github.com/state-spaces/mamba/blob/0131c1e94a46fc9f70bcfc9d57962963bb2f0b9e/mamba_ssm/ops/selective_scan_interface.py#L14 with https://github.com/Dao-AILab/flash-attention/blob/9356a1c0389660d7e231ff3163c1ac17d9e3824a/flash_attn/flash_attn_interface.py#L432 ? The inputs of these two modules are different, is this...

xiayuqing0622

Error during backprop when bias=True in Mamba block

4

Hi Albert & Tri, Awesome work! Thank you for sharing! I noticed a bug when using bias=True in a Mamba block. For example, using: ``` layer = Mamba( # This...

dwromero

mamba
mamba copied to clipboard

Metadata

[Need more info] `causal_conv1d` doesn't simulate "shifting of tokens by 1"

Visualization of Delta (post-Softplus) values during the Induction Task

Submit implementation to HuggingFace Transformers library

bfloat16 overflow during training session

Apptainer definition

Wikitext pipeline

Windows Support

CUDA version

how to compare mamba with flashattention2

Error during backprop when bias=True in Mamba block

← Metadata

Owner

Metadata

mamba mamba copied to clipboard

Metadata

← Metadata

Owner

Metadata

mamba
mamba copied to clipboard