Murali Andoorveedu
Murali Andoorveedu
Hey @wconstab gentle ping on this one :)
Hey @albanD @wconstab any update on this one? Would anyone be available to review this?
Is there anything else blocking this PR? From what I can tell test failures seem to be unrelated
Whoops let me fix that
Hey @albanD would you be able to help me fix the tags/reviewer list and kick off a new test? Messed up with the rebase a little bit.
@youkaichao Yes for sure, it is one of the TODO items above
Updated the RFC here: https://github.com/vllm-project/vllm/issues/4461 @youkaichao Let me know if anything needs further elaboration
FYI pretty sure PyTorch has a bug, filed here: https://github.com/pytorch/pytorch/issues/125079 Worked around this last week by making sending and receiving phase for each model atomic by concatenating residuals and hidden...
Sounds good @youkaichao, I can update mine once that's merged. Will you also include the change to create the multiple CPU TP groups or should I create a separate PR?
Sounds good - I'll revert the PyNCCL changes on this PR and wait for that to be merged to add in