iefgnoix comments

Results 35 comments of


                                            iefgnoix

torch xla gpu training unable to profile

I'm checking with tensorboard team on the issue `Overall GPU FLOPS utilization is 0.0%` @mars1248

torch xla gpu training unable to profile

The purpose of the mark_step is to make XLA compile and execute its current graph and materialize the tensors.

torch xla gpu training unable to profile

On another thought, can you comment out the line `print(f'Initial loss: {loss.item()}')` and try again? Also, which line in your script https://github.com/pytorch/xla/issues/6422#issuecomment-1980416414 fails?

torch xla gpu training unable to profile

The error `RuntimeError: Expecting scope to be empty but it is train_loop.` indicates that the code calls `mark_step` within `xp.StepTrace`. Plus, `xp.StepTrace` add xm.mark_step() when exiting the scope per https://cloud.google.com/tpu/docs/pytorch-xla-performance-profiling-tpu-vm#adding_traces....

torch xla gpu training unable to profile

> I'm checking with tensorboard team on the issue Overall GPU FLOPS utilization is 0.0% Just want to get back to you about it. The tensorboard told me this is...

[DO NOT REVIEW] Testing CI

TODO: expect both TPU CIs to be red.

[torchbench] Detectron2 benchmarks failing to run.

> I've just confirmed that instantiating the model with XLA device solves the error. i.e. changing the line below with `str(self.benchmark_experiment.get_device())` > > https://github.com/pytorch/xla/blob/423bb0b295319a692ee21787edbff50d07361db7/benchmarks/torchbench_model.py#L233 I thought you want to instantiate...

Need to reenable ZeRO1 for GPU to enable coverage for reduce-scatter/all-gather

hi @jeffhataws , I have a [PR](https://github.com/pytorch/xla/pull/6022) (Fix global_device_count(), local_device_count() for single process on CUDA) and currently the test `test_zero1 `is failing with error: https://gist.github.com/vanbasten23/b65423f2fd9c9859c0d4ecd47e058cfa. So I tried to fix...

Need to reenable ZeRO1 for GPU to enable coverage for reduce-scatter/all-gather

> Thanks. Will take a look. In the meantime, could you check if CCops like allgather is working properly for this test in your setup? Yeah, the CCops tests are...

Automatically move CUDA non XLA Tensors to XLA Device and back to CUDA device

Sorry for being late. It's looking good!