Results 35 comments of iefgnoix

I'm checking with tensorboard team on the issue `Overall GPU FLOPS utilization is 0.0%` @mars1248

The purpose of the mark_step is to make XLA compile and execute its current graph and materialize the tensors.

On another thought, can you comment out the line `print(f'Initial loss: {loss.item()}')` and try again? Also, which line in your script https://github.com/pytorch/xla/issues/6422#issuecomment-1980416414 fails?

The error `RuntimeError: Expecting scope to be empty but it is train_loop.` indicates that the code calls `mark_step` within `xp.StepTrace`. Plus, `xp.StepTrace` add xm.mark_step() when exiting the scope per https://cloud.google.com/tpu/docs/pytorch-xla-performance-profiling-tpu-vm#adding_traces....

> I'm checking with tensorboard team on the issue Overall GPU FLOPS utilization is 0.0% Just want to get back to you about it. The tensorboard told me this is...

TODO: expect both TPU CIs to be red.

> I've just confirmed that instantiating the model with XLA device solves the error. i.e. changing the line below with `str(self.benchmark_experiment.get_device())` > > https://github.com/pytorch/xla/blob/423bb0b295319a692ee21787edbff50d07361db7/benchmarks/torchbench_model.py#L233 I thought you want to instantiate...

hi @jeffhataws , I have a [PR](https://github.com/pytorch/xla/pull/6022) (Fix global_device_count(), local_device_count() for single process on CUDA) and currently the test `test_zero1 `is failing with error: https://gist.github.com/vanbasten23/b65423f2fd9c9859c0d4ecd47e058cfa. So I tried to fix...

> Thanks. Will take a look. In the meantime, could you check if CCops like allgather is working properly for this test in your setup? Yeah, the CCops tests are...