Tri Dao comments

Results 429 comments of


                                            Tri Dao

trafficstars

"too many indices for tensor of dimension 4"

Sure, I'll create a docker image and/or Colab this weekend. I'm a bit swamped with deadline until Friday.

Up to 2x speedup on GPUs using memory efficient attention

Thank you @MatthieuTPHR, super exited to see ideas on fast & memory-efficient attention having an impact!

A poorman mamba code

> I tried to compare [my code](https://gist.github.com/buttercutter/b3331ca1fd9e2f5871b0eded6b758f39) with [your code](https://github.com/state-spaces/mamba/blob/main/mamba_ssm/modules/mamba_simple.py) as well as [@johnma2006 's code](https://github.com/johnma2006/candle/blob/main/candle/models/mamba/model.py#L195) line-by-line, taking three code files in perspective, there seems to be no successful findings...

does forward/eval from a trained mamba model require cuda as well?

Yup, it's only implemented for CUDA for now. You can look at the `selective_scan_ref` for the pure pytorch implementation that should run on CPU (though probably quite slow).

Any suggestions for regularization?

You can use dropout, just like Transformers. It's not implemented here but you can add it.

WARNING

Are you sure it's from this repo? I did a search for "Cauchy" and found nothing.

WARNING

I think the warning is from the s4 repo.

Pip installation failing with 'command '/usr/local/cuda/bin/nvcc' failed with exit code 255'

I think i've seen [it](https://github.com/HazyResearch/flash-attention/issues/21). I haven't figured out the cause, but I think it's some combination of gcc version and nvcc version.

Pip installation failing with 'command '/usr/local/cuda/bin/nvcc' failed with exit code 255'

I think I've fixed the error "internal compiler error: in maybe_undo_parenthesized_ref" with this [commit](https://github.com/HazyResearch/flash-attention/commit/8a2ece89f7bd5d3124a6cae5fd95db5e85f07ee6) in the flash-attention repo.

Training a text classifier on Mamba

You'd probably want to write a MambaClassifierHeadModel that has a similar structure: a Mamba model backbone with a classifier head.