Tri Dao comments

Results 438 comments of


                                            Tri Dao

trafficstars

Triton Error

Yeah looks like triton error. Idk what's wrong, I'm not an expert in triton

Changing expansion_factor resulted in 'c10::DistBackendError

Not that i'm aware of, but usually you want dimensions to be multiple of 128 or 256 anyway.

Mamba-2: IndexError: map::at

Looks like a Triton error, which GPU do you use?

Mamba-2: IndexError: map::at

I'm not sure triton supports GPUs before Ampere (e.g. 2080) very well

big scan dim cause cuda error

Limit is probably < 64000.

On the small model, the actual GPU memory usage of Mamba2 is much higher than that of Mamba1.

nn.Conv1d is probably not great for memory usage. You should try to use causal_conv1d.

On the small model, the actual GPU memory usage of Mamba2 is much higher than that of Mamba1.

huh there's no requirement d_state / head_dim % 8 == 0 there's d_model / head_dim % 8 == 0 you can try the dimensions similar to the language models we've...

On the small model, the actual GPU memory usage of Mamba2 is much higher than that of Mamba1.

We already have a reference implementation: https://github.com/state-spaces/mamba/blob/8ffd905c91d207f5c0cc84fc2a2fb748655094f0/mamba_ssm/ops/triton/ssd_combined.py#L621

On the small model, the actual GPU memory usage of Mamba2 is much higher than that of Mamba1.

You can always pad the seqlen. The assert `seqlen == nchunks * chunk_size` for the reference is there for simplicity of implemtatnion. This ref version is not used to train...

CUDA error when using Mamba2 with long context

Are any tensors of size >= 2GB? We use int32 for indexing, it's possible that it wraps around the max of int32 and produce negative index, causing IMA.