Jiarui Fang（方佳瑞） comments

Results 220 comments of


                                            Jiarui Fang（方佳瑞）

Memory Leakage with USP and Transformer Blocks

@baifanxxx You applied USP in training or inference only?

Memory Leakage with USP and Transformer Blocks

Could please try this solution? I hardly build a test script to reproduce the memory leak issue. Maybe it existing when applied with other communications. For example allgather in your...

Incorrect version check for flash_attn leads to API incompatibility in v2.6.3

Could you please give a PR help fix the bug?

[Community Contribution] yunchang for HUAWEI Ascend NPU

Thanks a lot for your great contribution! The PR has been merged — really appreciate your work on adapting xdit and yunchang for HUAWEI Ascend NPU.

pytorch_attn_forward的op_type默认参数是否应该改为efficient？

这个是个 default 参数，真正使用时候会被正确赋值。

pytorch_attn_forward的op_type默认参数是否应该改为efficient？

[select_flash_attn_impl 不就已经根据 attn_type 选择除了正确的函数么？有什么函数](https://github.com/feifeibear/long-context-attention/blob/2c9b7120e70392c83acd2006a4f716aa407143ac/yunchang/ring/ring_flash_attn.py#L175) 你看最新 main 分支吧？你看的 commid 是不是太久了

pytorch_attn_forward的op_type默认参数是否应该改为efficient？

明白了。你能不能交一个 PR 把 attn_type 传给调用的 pytorch_attn_forward 函数

pytorch_attn_forward的op_type默认参数是否应该改为efficient？

为啥默认是 efficient 更好呢，fa 不如 efficient 么？

pytorch_attn_forward的op_type默认参数是否应该改为efficient？

AttnType.TORCH分的更细没问题的

How to add this attention to SGLang?

The UPS can work for the prefill phase of LLM inference.