selective_scan_cuda error
I'm using the m1 chip version of MacOS and python3.10 pytorch2.2.1 natively tried to use mamba_ssm.ops.selective_scan_interface native, so I tried to skip here, the truth is that it works, and it can also call model.to ("mps") so I made this modification attempt
This is an interesting discovery. Just curious: is there a significant speedup from mps over cpu?
You can put the import in the try except, but I wouldn't call the selective_scan_ref function in selective_scan_fn if selective_scan_cuda is not found. Instead it should error.
We don't want people to silently get much slower performance if they forgot to install the CUDA extension, or the installation was not correct.
This is an interesting discovery. Just curious: is there a significant speedup from mps over cpu?
Hello, my English is not very good, so I took the translation tool and replied: You can check out this official document, in fact he has some improvements. https://pytorch.org/blog/introducing-accelerated-pytorch-training-on-mac/
I'm using the m1 chip version of MacOS and python3.10 pytorch2.2.1 natively tried to use mamba_ssm.ops.selective_scan_interface native, so I tried to skip here, the truth is that it works, and it can also call model.to ("mps") so I made this modification attempt
Can you give me a brief intro about how it works on mps device ? I would appreciate it if you can contact me