mamba issues

Is there a small model trainer?

8

Is there a small model trainer, such as llama2.c?

Implement bi-directionality

8

**Edit:** - [x] Implement bi-directionality by applying `Mamba` module twice: (1) to the forward sequence and (2) to the backward sequence. - [x] Implement ~3~ 2 strategies for combining forward...

yair-schiff

WARNING:root:CUDA extension for structured kernels (Cauchy and Vandermonde multiplication) not found. Install by going to extensions/kernels/ and running `python setup.py install`, for improved speed and memory efficiency. Note that the...

aaaaamilk

does forward/eval from a trained mamba model require cuda as well?

5

codes in selective_scan_fwd() of selective_scan.cpp seem to suggest even forward from a trained model would require cuda, which might be inconvenient when running models in production environments. Any idea how...

shadowleaves

loss nan after several epochs

18

Dear @albertfgu and @tridao, Thanks for sharing the awesome work. I'm trying to incorporate mamba into a CNN and the training goes well at the first a few epochs (100-750)...

JunMa11

Add support for left padding and masking in forward() and generate()

20

This PR implements masking for left contiguous pad tokens by zeroing out intermediate state values, per the discussion at https://github.com/state-spaces/mamba/issues/66, for all three code paths: non-fused, fused without CUDA graph,...

normster

causal-conv1d installation error

44

Hi . I am trying to install and use mamba but i cant install causal-conv1d with pip then I tried to build it from source but I get same error...

emadkavousi

Any suggestions for regularization?

8

Dear Mamba-SSM team, congratulations on your success! Obviously many of us are excited about exploring the applications of your work. Since there's no dropout in your model, what do you...

drscotthawley

Code for training Transformer++

Thanks for the great work, especially the scaling curves! I'm particularly interested in how you trained the Transformer++ models. Is there any public codebase that you used to train them?...

alexlioralexli

RuntimeError: CUDA error: no kernel image is available for execution on the device on 3xP40

5

I'm having an issue trying to get mamba running on 3xP40. The model will load into vram but then crashes with "RuntimeError: CUDA error: no kernel image is available for...

DewEfresh

mamba
mamba copied to clipboard

Metadata

Is there a small model trainer?

Implement bi-directionality

WARNING

does forward/eval from a trained mamba model require cuda as well?

loss nan after several epochs

Add support for left padding and masking in forward() and generate()

causal-conv1d installation error

Any suggestions for regularization?

Code for training Transformer++

RuntimeError: CUDA error: no kernel image is available for execution on the device on 3xP40

← Metadata

Owner

Metadata

mamba mamba copied to clipboard

Metadata

← Metadata

Owner

Metadata

mamba
mamba copied to clipboard