Chenguang Li

Results 11 comments of Chenguang Li

Additionally, I think we still need to verify that there are currently no memory leaks.

> @rmatif @noemotiovon Could you confirm whether the OpenCL and CANN implementations of Flash Attention assume that the KQ mask is padded as described in the OP? Hi @ggerganov, I...

@ruisearch42, Hi, It’s possible that in the future, the communication backend will not be limited to NCCL, as shown in this [PR](https://github.com/ray-project/ray/pull/51032) with HCCL. Therefore, would it be possible to...

This PR need follow https://github.com/ray-project/ray/issues/51574. Make it draft.

This PR is a follow-up to https://github.com/ray-project/ray/pull/51032, which introduced multi-device support in the Compiled Graph by leveraging CUDA's NCCL backend for efficient out-of-band tensor communication. While the current implementation is...

Hi @ruisearch42, @hipudding, this PR aims to generalize the communication backend interface and decouple NCCL-specific logic, as a follow-up to #51032. Happy to hear any feedback or suggestions! :blush:

@ruisearch42, Thank you so much for the timely and careful review! Apologies for the leftover debugging code — that sneaky print wasn’t meant to stick around. :joy: I’ve updated the...

Hi @ruisearch42, Thanks again for your earlier review! I’ve updated the PR according to the feedback. When you have time, could you please take another look? Appreciate it!

Hi @ruisearch42, Thanks for the review, and apologies for the ambiguity caused by my oversight. I've addressed all the comments and corrected the issues accordingly. Looking forward to your continued...

Hi @ruisearch42 , Very sorry! I missed updating the assertions in the test file, which caused the CI failure. I've just fixed it—could you please help me re-trigger the CI?...