Cunxiao Du comments

Repositories
Issues
Comments

Results 13 comments of


                                            Cunxiao Du

[Bug]: assert len(self._async_stopped) == 0

same issue here

bfloat16 of fused attention seems have bug

Thanks for your reply! However, based on my test case, when using group query attention, the gradient of k and v cannot pass allclose with torch implementation vs fused-attention.

The update of k_cache_paged and v_cache_paged

flash_attn_with_kv_cache will automatically update, close the issue