Tri Dao comments

Results 250 comments of


                                            Tri Dao

exp the log of A?

So that we use self.A_log as parameter and there's no restriction. If we parameterize A as a parameter directly it's harder to constrain A to be positive (which is what...

ImportError of causal_con1d with torch 2.2 upgrade

I just pushed a version of causal-conv1d, can you try again?

Possible using tarbular data as input

We don't have much experience with tabular data but you can try.

Shape of state transition matrix A ?

That's not supported in the CUDA code, but you can play around with selective_scan_ref which is in Pytorch (but much slower). Instead of multiplying A with previous hidden states pointwise...

How to weights merge?

I don't have experience with model merging, keeping this issue open in case there are others who can help.

TypeError: decode() got an unexpected keyword argument 'min_p'

Can you try again with the latest version of `mamba-ssm`? We've just updated it.

Question about exporting Mamba models to GGUF format for Ollama deployment

I'm not familiar with the GGUF format but perhaps others might be able to help.

Is Context Length dependent on training data's context?

The models were trained with 2k context, it's cool that passkey retrieval works up to 3-4k tokens. Would be cool to train Mamba with longer context and see how it...

Question Regarding Randomness

Did you follow the suggestion in the error message?

Question Regarding Randomness

I'm not sure where the randomness is from. Can you comment out lines in the Mamba implementation to isolate?