Inference without Flash Attention is Needed!

Open EdmunddzzZ opened this issue 4 months ago • 2 comments

Ovis2.5 is awesome!

But in modeling_ovis2_5.py line 246: attn_output = flash_attn_varlen_func(queries, keys, values, cu_seqlens, cu_seqlens, max_seqlen, max_seqlen).reshape(seq_length, -1)

I made serveral tries and couldn't find a good way to replace flash_attn, it will be very nice if ovis team can offer the version of ovis2.5 without flash_attn, to support GPU like V100、2080ti etc.

Thank you!

Aug 22 '25 05:08 EdmunddzzZ

You can build flashattn packe from source

Oct 20 '25 04:10 hfassold

You can build flashattn packe from source

that wont be enough since turing etc are not supported /:

Oct 21 '25 15:10 wsbagnsv1