Ovis
Ovis copied to clipboard
Inference without Flash Attention is Needed!
Ovis2.5 is awesome!
But in modeling_ovis2_5.py line 246:
attn_output = flash_attn_varlen_func(queries, keys, values, cu_seqlens, cu_seqlens, max_seqlen, max_seqlen).reshape(seq_length, -1)
I made serveral tries and couldn't find a good way to replace flash_attn, it will be very nice if ovis team can offer the version of ovis2.5 without flash_attn, to support GPU like V100、2080ti etc.
Thank you!
You can build flashattn packe from source
You can build flashattn packe from source
that wont be enough since turing etc are not supported /: