Will Cromar

Results 22 comments of Will Cromar

Simple test case to sanity-check that collectives work as expected: ``` $ gcloud compute tpus tpu-vm ssh --project=tpu-pytorch --zone=us-central2-b wcromar-v4-32 --internal-ip --worker=all --command 'PJRT_DEVICE=TPU python3 -c " import torch_xla.core.xla_model as...

@ronghanghu Thanks for flagging the issue with the other collectives. I did check `all_gather` as well, but I didn't think to try with `pin_layout=False`. This snippet gives the expected results:...

Also,`xm.rendezvous` doesn't work yet, but we had another early tester tell us that they were able to work around it by creating a `gloo` process group and using `dist.barrier`

`barrier` will almost certainly not work with threads if you use the global default process group (i.e. use `init_process_group`), because each thread will use the same PG. It might work...

Thanks @JackCaoG. I'll work on a README showing how to port from XRT to PjRt and how to run models without `xla_dist`.

Thanks @ronghanghu for the awesome detail on this issue! You touched on a few issues here. For now, I'll focus on implementing `xm.get_ordinal`, `xm.xla_device`, etc. to have the right default...

The new test passed on CPU but flaked on GPU: ``` 2022-08-10 23:44:46.343597: W 562435 tensorflow/core/profiler/lib/profiler_session.cc:107] Profiling is late by 1594605 nanoseconds and will start immediately. 2022-08-10 23:44:48.736212: W 562436...

The GPU test has `Profiling is late by 1594605 nanoseconds` vs CPU with `Profiling is late by 760266 nanoseconds`. Maybe the XLA execution had finished by the time tracing had...

There is some lag between when the tracer thread starts and when it actually starts tracing, long enough that XLA execution can finish before it starts. This test was only...

Test still flaked on GPU (even though it's using CPU) and I can't reproduce the error locally. Removing it from the CI tests since I'll have to add a TPU...