amithrm comments

Results 16 comments of


                                            amithrm

Adding support for torchrun in xla backend

@JackCaoG Alll the 4 pass @will-cromar is there anything else needed?

Pipeline parallelism with SPMD

A simple example to get the conversation started and use to feature complete. ` ``` pipeline_cuts=['layers.4'] class SimpleLinear(nn.Module): def __init__(self): super(SimpleLinear, self).__init__() self.fc1 = nn.Linear(FLAGS.input_dim, FLAGS.input_dim * 4, bias=False) self.relu...

Pipeline parallelism with SPMD

Trying to make this work, hitting into a basic issue, creating a ticket for this: https://github.com/pytorch/xla/issues/6647

Gradient bucketing using a pre-defined bucket size cap

Added the test case and rebased @JackCaoG @alanwaketan

Using CC ops with mark_sharding API throws an error.

@baoleai @yitongh is the send/recv using XLA Send/Recv ? We are using all-reduce instead of send/recv to simplify our stack and we can assume that only non-sharded tensors will be...

Using CC ops with mark_sharding API throws an error.

@JackCaoG For the cc ops set-up, why do we need ti set up PjRT in a different way? All we need is the graph with the correct replica groups correct?...