scGPT RuntimeError: Expected is_sm90 || is_sm8x || is

Hi. I managed to successfully install scGPT. However, when I actually try to run some of the fine-tuning code I get an error associated with flash_attn_cuda:

---> 21     softmax_lse, *rest = flash_attn_cuda.fwd(
     22         q, k, v, out, cu_seqlens_q, cu_seqlens_k, max_seqlen_q, max_seqlen_k, dropout_p,
     23         softmax_scale, False, causal, return_softmax, num_splits, generator
     24     )
     25     # if out.isnan().any() or softmax_lse.isnan().any():
     26     #     breakpoint()
     27     S_dmask = rest[0] if return_softmax else None

RuntimeError: Expected is_sm90 || is_sm8x || is_sm75 to be true, but got false. (Could this error message be improved?  If so, please report an enhancement request to PyTorch.)

I have the following package versions: CUDA version = 11.7.0 torch = 1.13.0+cu117 flash-attn = 1.0.1 (I also tested with both 1.0.3 and 1.0.4)

Now there is a thread (https://github.com/pytorch/pytorch/issues/94883) where they fixed it by installing a pytorch nightly. However, this results in dependency conflicts with torch package:

scgpt 0.1.2.post1 requires torch==1.13.0, but you have torch 2.1.0.dev20230621+cu117

Any suggestions would be greatly appreciated! Let me know if you need any more info. Thanks!

Jul 14 '23 14:07 NikicaJEa

Hello, I'm having the same issue. I'm using CUDA 11.7.0, torch 2.0.1+cu117, and flash-attn version 1.0.4. I thought it was a personal issue, depending on the GPU I was using. For me, the message occurs when I'm running the training.

Jul 14 '23 14:07 Ragagnin

Hi, what gpu are you using? It looks like a known supporting issue of flash-attn, which only support newer gpus? You can find the list of supported ones here https://github.com/HazyResearch/flash-attention#installation-and-features

Jul 14 '23 14:07 subercui

BTW, we also noticed a lot of reporting issues when people install flash-attn. My current plan is to

[ ] provide a working docker image asap, hopefully, that can be useful for people can access docker.
[x] For people with GPUs but not supporting flash-attn, we will try to have better compatibility with naive pytorch and make the flash-attn dependency optional
[x] For people without GPU access, provide better support for running with CPUs

Jul 14 '23 15:07 subercui

Yes, it seems its a hardware limitation since my GPU (NVIDIA Quadro P4000) is not supported by FlashAttention. Thanks for your efforts!

Jul 14 '23 16:07 NikicaJEa

having the same issue on Tesla V100-SXM2 flash_attn==1.0.4

Sep 19 '23 02:09 mjstrumillo

Hi @mjstrumillo , falsh_attn currently requires Turing or later GPUs. Please see this comment https://github.com/bowang-lab/scGPT/issues/39#issuecomment-1635989348 .

The current issue when running without flash_attn is about loading the pretrained weights. Flash_attn and pytorch have different param names for MHA in transformers, and since the pretraining was completed using flash_attn, loading the weights would assume the naming style of flash_attn. So if you can not use flash_attn, a quick workaround would be having an additional mapping of the names when you try to load the pretrained weights. We also plan to support this in the near future

Sep 19 '23 17:09 subercui

@subercui Could you explain what is an alternative to flash_attn? What should I change in the code to workaround that? I'm getting the same error. The problem is, when I use_flash_attn = False, loading pretrained model fails.

Mar 27 '24 20:03 sepidism