Jiarui Fang(方佳瑞)

Results 220 comments of Jiarui Fang(方佳瑞)

@baifanxxx You applied USP in training or inference only?

Could please try this solution? I hardly build a test script to reproduce the memory leak issue. Maybe it existing when applied with other communications. For example allgather in your...

Thanks a lot for your great contribution! The PR has been merged — really appreciate your work on adapting xdit and yunchang for HUAWEI Ascend NPU.

这个是个 default 参数,真正使用时候会被正确赋值。

[select_flash_attn_impl 不就已经根据 attn_type 选择除了正确的函数么?有什么函数](https://github.com/feifeibear/long-context-attention/blob/2c9b7120e70392c83acd2006a4f716aa407143ac/yunchang/ring/ring_flash_attn.py#L175) 你看最新 main 分支吧?你看的 commid 是不是太久了

明白了。你能不能交一个 PR 把 attn_type 传给调用的 pytorch_attn_forward 函数

为啥默认是 efficient 更好呢,fa 不如 efficient 么?

The UPS can work for the prefill phase of LLM inference.