Tri Dao comments

Results 438 comments of


                                            Tri Dao

trafficstars

Error when using FP16 or Mixed precision

There's a reference implementation in pytorch but would probably be quite a bit slower

Question for 'self.use_mem_eff_path and inference_params'

It implements the same operation, just more memory efficent (as the name suggest).

Question for 'self.use_mem_eff_path and inference_params'

It should compute the same answer

Where is ‘Block’ class in the new version mamba?

It's moved to [mamba_ssm/modules/block.py](https://github.com/state-spaces/mamba/blob/c0a00bd1808881831ddf43206c69362d4df90cf7/mamba_ssm/modules/block.py#L10)

Inference multiple tokens

Probably yes. How would you do it with Transformers?

Passing an initial_conv_state in mamba_split_conv1d_scan_combined?

Yes this is a good idea. The conv1d implementatation actually already supports taking in intial states and returning final states. We just haven't had time to wired everything together.

Questions about Chunk_size using Triton optimization in SSD kernel

Yes chunk size should be a power of 2, that's what Triton supports. To deal with seqlen not divisible by chunk_size, we load with a mask. Anything outside the seqlen...

Help with _chunk_state_fwd.

You can see the reference implementation: https://github.com/state-spaces/mamba/blob/8ffd905c91d207f5c0cc84fc2a2fb748655094f0/mamba_ssm/ops/triton/ssd_chunk_state.py#L960

How to load mamba1's weight to mamba2 ?

No, the model architectures are different

Function 'MambaSplitConv1dScanCombinedFnBackward' returned nan values in its 0th output.

Can you post a script that help us reproduce the error? E.g. save the tensors that produce the Nan?