mamba
mamba copied to clipboard
I trained 3 models, but after averaging the weights, the model output is garbled!
I am interested in deploying mamba within the Ollama framework. I was wondering if thers's a way to export Mamba models to GGUF format, or if there are any recommended...
I notice that passkey retrieval works well up to around 3-4k tokens. After that, it doesn't. That wasn't my intuition for SSMs, but I guess context length is still related...
Hi there! I want to train a mamba language model from scratch on my own dataset. However, during the training of process, there is a problem. When I set CUDA_LAUNCH_BLOCKING=1...
I understand that currently mamba requires nvcc (Cuda) to install. Is there a version planned for MacOS M1? pytorch already support the MPS and it shall be nice to have...
Support variable-length sequences for mamba block via `cu_seqlens` in the `forward` pass and `backward` pass, similar to what has been done (such as cumulative sequences `cu_seqlens` or lower triangular block...
Processing /home/ubuntu/causal-conv1d Preparing metadata (setup.py) ... error error: subprocess-exited-with-error × python setup.py egg_info did not run successfully. │ exit code: 1 ╰─> [11 lines of output] Traceback (most recent call...
my envs: causal-conv1d 1.1.1 mamba-ssm 1.2.0.post1 but I installed it inside Windows,i do not know why