mamba
mamba copied to clipboard
Is there a small model trainer, such as llama2.c?
**Edit:** - [x] Implement bi-directionality by applying `Mamba` module twice: (1) to the forward sequence and (2) to the backward sequence. - [x] Implement ~3~ 2 strategies for combining forward...
WARNING
WARNING:root:CUDA extension for structured kernels (Cauchy and Vandermonde multiplication) not found. Install by going to extensions/kernels/ and running `python setup.py install`, for improved speed and memory efficiency. Note that the...
codes in selective_scan_fwd() of selective_scan.cpp seem to suggest even forward from a trained model would require cuda, which might be inconvenient when running models in production environments. Any idea how...
Dear @albertfgu and @tridao, Thanks for sharing the awesome work. I'm trying to incorporate mamba into a CNN and the training goes well at the first a few epochs (100-750)...
This PR implements masking for left contiguous pad tokens by zeroing out intermediate state values, per the discussion at https://github.com/state-spaces/mamba/issues/66, for all three code paths: non-fused, fused without CUDA graph,...
Hi . I am trying to install and use mamba but i cant install causal-conv1d with pip then I tried to build it from source but I get same error...
Dear Mamba-SSM team, congratulations on your success! Obviously many of us are excited about exploring the applications of your work. Since there's no dropout in your model, what do you...
Thanks for the great work, especially the scaling curves! I'm particularly interested in how you trained the Transformer++ models. Is there any public codebase that you used to train them?...
I'm having an issue trying to get mamba running on 3xP40. The model will load into vram but then crashes with "RuntimeError: CUDA error: no kernel image is available for...