Murali Andoorveedu

Results 50 comments of Murali Andoorveedu

Hey @wconstab gentle ping on this one :)

Hey @albanD @wconstab any update on this one? Would anyone be available to review this?

Is there anything else blocking this PR? From what I can tell test failures seem to be unrelated

Hey @albanD would you be able to help me fix the tags/reviewer list and kick off a new test? Messed up with the rebase a little bit.

@youkaichao Yes for sure, it is one of the TODO items above

Updated the RFC here: https://github.com/vllm-project/vllm/issues/4461 @youkaichao Let me know if anything needs further elaboration

FYI pretty sure PyTorch has a bug, filed here: https://github.com/pytorch/pytorch/issues/125079 Worked around this last week by making sending and receiving phase for each model atomic by concatenating residuals and hidden...

Sounds good @youkaichao, I can update mine once that's merged. Will you also include the change to create the multiple CPU TP groups or should I create a separate PR?

Sounds good - I'll revert the PyNCCL changes on this PR and wait for that to be merged to add in