Wonjoo Lee
Wonjoo Lee
PyTorch python op tests are failing: ``` ====================================================================== ERROR: test_upsamplingNearest2d_xla (__main__.TestNNDeviceTypeXLA) ---------------------------------------------------------------------- Traceback (most recent call last): File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_device_type.py", line 391, in instantiated_test raise rte File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_device_type.py", line 378, in...
Ok, after removing the `upsample_nearest2d.vec` and updating the existing `GetOutputSizeWithScale` function to accept `scale_h` and `scale_w` as such: https://github.com/pytorch/xla/blob/1d0c3393fb48cb8740379e4fea9c37a0e131a7dd/torch_xla/csrc/aten_xla_type.cpp#L165-L175 I can confirm the related cpp tests are passing: `UpsampleNearest2D`, `UpsampleNearest2DWithScale`,...
One guess is that while the total amount of HBM are equal between v3 and v4, the HBM bandwidth of v4 is higher than that of v3 (https://cloud.google.com/tpu/docs/system-architecture-tpu-vm#tpu_v4) so it...
I've opened opened https://github.com/pytorch/xla/pull/4480 (picking up @milesial's https://github.com/pytorch/xla/pull/4471). @milesial, can we update the XLA's commit pin (https://github.com/pytorch/pytorch/blob/master/.github/ci_commit_pins/xla.txt) on this PR to `eddaa4b3cf7c4c9302b6b04c6e5d13b4c6ba260b` and let the CI verify? Thanks!
~So, it looks like the XLA's PR is treating the `RuntimeError` as failures so updated the https://github.com/pytorch/xla/pull/4480 to explicitly skip the `test_clip_grad_value_foreach_True_*` and `test_clip_grad_norm_foreach_True_*` tests.~ Oh, I just saw you...
Looks like the CIs are green on both sides. Let's coordinate a merge tomorrow, thanks!
> @wonjoolee95 CI passed, can you merge the XLA PR? @milesial, just merged to master. The new pin should be `eac4e547138ab22a9b41c6f96208613fd7dd19d5`.
Okay since we can't force merge this right now, I'm going to revert the XLA's PR lol.
You can update the XLA pin in this PR to `8dcab83819368f468dadbe6e81b064d268830df2` and `merge -g`. I'll merge the XLA's companion PR once this merges.
Thanks for reporting the issue. Pasting the error here for visibility: ``` --------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[8], line 4 1 generator = torch.Generator().manual_seed(0) 2 # xm.mark_step...