Tri Dao

Results 438 comments of Tri Dao
trafficstars

Yeah looks like triton error. Idk what's wrong, I'm not an expert in triton

Not that i'm aware of, but usually you want dimensions to be multiple of 128 or 256 anyway.

Looks like a Triton error, which GPU do you use?

I'm not sure triton supports GPUs before Ampere (e.g. 2080) very well

Limit is probably < 64000.

nn.Conv1d is probably not great for memory usage. You should try to use causal_conv1d.

huh there's no requirement d_state / head_dim % 8 == 0 there's d_model / head_dim % 8 == 0 you can try the dimensions similar to the language models we've...

We already have a reference implementation: https://github.com/state-spaces/mamba/blob/8ffd905c91d207f5c0cc84fc2a2fb748655094f0/mamba_ssm/ops/triton/ssd_combined.py#L621

You can always pad the seqlen. The assert `seqlen == nchunks * chunk_size` for the reference is there for simplicity of implemtatnion. This ref version is not used to train...

Are any tensors of size >= 2GB? We use int32 for indexing, it's possible that it wraps around the max of int32 and produce negative index, causing IMA.