mamba
mamba copied to clipboard
If the local conv (or "causal conv1d") is intended to shift the tokens by 1, then this should instead be `padding=d_conv` instead of `padding=d_conv - 1`, shouldn't it? (Or can...
Following shows the actual values of the input-dependent $\Delta$ during inference of the 2-layer network during the induction-heads task as described in the paper. I successfully trained the model to...
First of all, thanks for your amazing work on Mamba. This is probably one of the most exciting papers I have read this year! I am opening this issue to...
1. I tried vanilla pytorch training loop using bfloat16, the loss got overflow, https://github.com/mesolitica/malaya/blob/5.1/pretrained-model/mamba/causallm-130m-bf16.ipynb 2. so I tried vanilla pytorch training loop using fp32, the loss is ok, https://github.com/mesolitica/malaya/blob/5.1/pretrained-model/mamba/causallm-130m-fp32.ipynb 3....
hi! thanks for your great models. Since getting it running can be a challenge, I wrote an [Apptainer](https://apptainer.org/) definition file like so, which makes it very to run on HPC...
Hi, can you please share pipeline for the wikitext dataset. I found results with 16.3 for mamba and 18 (vs. 18.6 everywhere else) perplexity for the transformer baseline and can...
I'm able to compile [causal-conv1d](https://github.com/Dao-AILab/causal-conv1d) by adding ``` "-DWIN32_LEAN_AND_MEAN", ``` To the nvcc flags. When compiling mamba, after adding `-DWIN32_LEAN_AND_MEAN` to nvcc flags, I find I need to add ```...
Thanks for the great work! I notice that - [your installation](https://github.com/state-spaces/mamba#installation) states that the lowest CUDA version is 11.6. - mamba [v1.0.1](https://github.com/state-spaces/mamba/releases/tag/v1.0.1) supports CUDA 11.8 and CUDA 12.2. Is it...
In your paper, you mentioned that mamba scan is faster than flashattention2. Does it mean comparing https://github.com/state-spaces/mamba/blob/0131c1e94a46fc9f70bcfc9d57962963bb2f0b9e/mamba_ssm/ops/selective_scan_interface.py#L14 with https://github.com/Dao-AILab/flash-attention/blob/9356a1c0389660d7e231ff3163c1ac17d9e3824a/flash_attn/flash_attn_interface.py#L432 ? The inputs of these two modules are different, is this...
Hi Albert & Tri, Awesome work! Thank you for sharing! I noticed a bug when using bias=True in a Mamba block. For example, using: ``` layer = Mamba( # This...