Jiarui Fang(方佳瑞)
Jiarui Fang(方佳瑞)
@baifanxxx You applied USP in training or inference only?
Could please try this solution? I hardly build a test script to reproduce the memory leak issue. Maybe it existing when applied with other communications. For example allgather in your...
Could you please give a PR help fix the bug?
Thanks a lot for your great contribution! The PR has been merged — really appreciate your work on adapting xdit and yunchang for HUAWEI Ascend NPU.
这个是个 default 参数,真正使用时候会被正确赋值。
[select_flash_attn_impl 不就已经根据 attn_type 选择除了正确的函数么?有什么函数](https://github.com/feifeibear/long-context-attention/blob/2c9b7120e70392c83acd2006a4f716aa407143ac/yunchang/ring/ring_flash_attn.py#L175) 你看最新 main 分支吧?你看的 commid 是不是太久了
明白了。你能不能交一个 PR 把 attn_type 传给调用的 pytorch_attn_forward 函数
为啥默认是 efficient 更好呢,fa 不如 efficient 么?
AttnType.TORCH分的更细 没问题的
The UPS can work for the prefill phase of LLM inference.