RuntimeError: Expected is_sm90 || is_sm8x || is_sm75 to be true, but got false.
Hi. I managed to successfully install scGPT. However, when I actually try to run some of the fine-tuning code I get an error associated with flash_attn_cuda:
---> 21 softmax_lse, *rest = flash_attn_cuda.fwd(
22 q, k, v, out, cu_seqlens_q, cu_seqlens_k, max_seqlen_q, max_seqlen_k, dropout_p,
23 softmax_scale, False, causal, return_softmax, num_splits, generator
24 )
25 # if out.isnan().any() or softmax_lse.isnan().any():
26 # breakpoint()
27 S_dmask = rest[0] if return_softmax else None
RuntimeError: Expected is_sm90 || is_sm8x || is_sm75 to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)
I have the following package versions: CUDA version = 11.7.0 torch = 1.13.0+cu117 flash-attn = 1.0.1 (I also tested with both 1.0.3 and 1.0.4)
Now there is a thread (https://github.com/pytorch/pytorch/issues/94883) where they fixed it by installing a pytorch nightly. However, this results in dependency conflicts with torch package:
scgpt 0.1.2.post1 requires torch==1.13.0, but you have torch 2.1.0.dev20230621+cu117
Any suggestions would be greatly appreciated! Let me know if you need any more info. Thanks!
Hello, I'm having the same issue. I'm using CUDA 11.7.0, torch 2.0.1+cu117, and flash-attn version 1.0.4. I thought it was a personal issue, depending on the GPU I was using. For me, the message occurs when I'm running the training.
Hi, what gpu are you using? It looks like a known supporting issue of flash-attn, which only support newer gpus? You can find the list of supported ones here https://github.com/HazyResearch/flash-attention#installation-and-features
BTW, we also noticed a lot of reporting issues when people install flash-attn. My current plan is to
- [ ] provide a working docker image asap, hopefully, that can be useful for people can access docker.
- [x] For people with GPUs but not supporting flash-attn, we will try to have better compatibility with naive pytorch and make the flash-attn dependency optional
- [x] For people without GPU access, provide better support for running with CPUs
Yes, it seems its a hardware limitation since my GPU (NVIDIA Quadro P4000) is not supported by FlashAttention. Thanks for your efforts!
having the same issue on Tesla V100-SXM2 flash_attn==1.0.4
Hi @mjstrumillo , falsh_attn currently requires Turing or later GPUs. Please see this comment https://github.com/bowang-lab/scGPT/issues/39#issuecomment-1635989348 .
The current issue when running without flash_attn is about loading the pretrained weights. Flash_attn and pytorch have different param names for MHA in transformers, and since the pretraining was completed using flash_attn, loading the weights would assume the naming style of flash_attn. So if you can not use flash_attn, a quick workaround would be having an additional mapping of the names when you try to load the pretrained weights. We also plan to support this in the near future
@subercui Could you explain what is an alternative to flash_attn? What should I change in the code to workaround that? I'm getting the same error. The problem is, when I use_flash_attn = False, loading pretrained model fails.