Firefly
Firefly copied to clipboard
微调Qwen-7B,推理报错RuntimeError: FlashAttention only supports Ampere GPUs or newer.
context_layer = self.core_attention_flash(q, k, v, attention_mask=attention_mask)
File "/home/ec2-user/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/home/.cache/huggingface/modules/transformers_modules/Qwen-7B-qlora-sft-merge/modeling_qwen.py", line 213, in forward output = flash_attn_unpadded_func( File "/home/ec2-user/anaconda3/envs/llm-py3.9-tf2.11/lib/python3.9/site-packages/flash_attn/flash_attn_interface.py", line 906, in flash_attn_varlen_func return FlashAttnVarlenFunc.apply( File "/home/ec2-user/anaconda3/envs/llm-py3.9-tf2.11/lib/python3.9/site-packages/flash_attn/flash_attn_interface.py", line 496, in forward out, q, k, v, out_padded, softmax_lse, S_dmask, rng_state = _flash_attn_varlen_forward( File "/home/ec2-user/anaconda3/envs/llm-py3.9-tf2.11/lib/python3.9/site-packages/flash_attn/flash_attn_interface.py", line 79, in _flash_attn_varlen_forward out, q, k, v, out_padded, softmax_lse, S_dmask, rng_state = flash_attn_cuda.varlen_fwd( RuntimeError: FlashAttention only supports Ampere GPUs or newer.
你用的什么显卡?如果是游戏卡必须是30系以上