xla
xla copied to clipboard
Cannot move tensors to cpu when in a xmp spawn process
🐛 Bug
all_frames = torch.cat(all_frames, dim=0).cpu().numpy()
RuntimeError: Bad StatusOr access: INTERNAL: during context [pre-optimization]: RET_CHECK failure (third_party/tensorflow/compiler/xla/service/hlo_verifier.cc:402) replica_count == 1 || n == replica_count In kCrossReplica mode, replica groups should contain 8 replicas, but found 2: %all-gather.20571 = f16[81920,8,64]{2,1,0} all-gather(f16[40960,8,64]{2,1,0} %add.20570), replica_groups={{0,1}}, dimensions={0}
To Reproduce
Steps to reproduce the behavior:
- spawn a process, with xmp spawn
- Move tensors to cpu using .cpu
Expected behavior
Should move tensors to cpu.
Environment
- Reproducible on XLA backend [CPU/TPU/CUDA]: TPU v2-8 and v3-8
- torch_xla version: nightly 2.6