jeffhataws comments

Results 21 comments of


                                            jeffhataws

Gradient bucketing using a pre-defined bucket size cap

@JackCaoG looks like build is still failing for some reason after rebasing. Maybe another rebase is needed?

Gradient bucketing using a pre-defined bucket size cap

Replaced by https://github.com/pytorch/xla/pull/7216 to avoid the build issues in CI testing.

Need to reenable ZeRO1 for GPU to enable coverage for reduce-scatter/all-gather

Back-trace for GPU_NUM_DEVICES=2 case: ``` #0 0x00005605c2e3e550 in ?? () [0/1876] #1 0x00007f3bf28153cb in xla::TrackedDeviceBuffer::~TrackedDeviceBuffer() () from /usr/local/lib/python3.8/site-packages/torch_xla-2.2.0+git3dce325-py3.8-linux-x86_64.egg/_XLAC.cpython-38-x86_64-linux-gnu.so #2 0x00007f3bf27e53d0 in absl::lts_20230802::internal_statusor::StatusOrData::~StatusOrData() () from /usr/local/lib/python3.8/site-packages/torch_xla-2.2.0+git3dce325-py3.8-linux-x86_64.egg/_XLAC.cpython-38-x86_64-linux-gnu.so #3 0x00007f3bf27f6607 in xla::PjRtStreamExecutorBuffer::Delete() ()...

Need to reenable ZeRO1 for GPU to enable coverage for reduce-scatter/all-gather

Backtrace with pytorch also built with DEBUG=1: ``` #0 0x00007fa57f42363a in xla::LocalDeviceState::allocation_model() const () from /usr/local/lib/python3.8/site-packages/torch_xla-2.2.0+git7c46e4c-py3.8-linux-x86_64.egg/_XLAC.cpython-38-x86_64-linux-gnu.so #1 0x00007fa57f402d9f in xla::PjRtStreamExecutorBuffer::Release(bool) () from /usr/local/lib/python3.8/site-packages/torch_xla-2.2.0+git7c46e4c-py3.8-linux-x86_64.egg/_XLAC.cpython-38-x86_64-linux-gnu.so #2 0x00007fa57f4032ac in xla::PjRtStreamExecutorBuffer::Delete() () from /usr/local/lib/python3.8/site-packages/torch_xla-2.2.0+git7c46e4c-py3.8-linux-x86_64.egg/_XLAC.cpython-38-x86_64-linux-gnu.so...

Need to reenable ZeRO1 for GPU to enable coverage for reduce-scatter/all-gather

I narrowed to this commit f9c12fc11bb487675515a717ef89ecf954fe539f which allows the updated test_zero1.py to pass on GPU. Let me cherry-pick to 2.2 to check that the test still passes.

Need to reenable ZeRO1 for GPU to enable coverage for reduce-scatter/all-gather

Confirmed that cherry-picking this change into 2.2 fixes test_zero1.py on GPU.

Need to reenable ZeRO1 for GPU to enable coverage for reduce-scatter/all-gather

Thanks @alanwaketan for debugging tips. @JackCaoG @miladm let's cherry-pick/backport https://github.com/pytorch/xla/commit/f9c12fc11bb487675515a717ef89ecf954fe539f and https://github.com/pytorch/xla/commit/a60f8e7c066086af50b677f097e3f1c6559d6918 into 2.2 to fix test_zero1 on GPU?

Need to reenable ZeRO1 for GPU to enable coverage for reduce-scatter/all-gather

Thanks. Will take a look. In the meantime, could you check if CCops like allgather is working properly for this test in your setup?

Need to reenable ZeRO1 for GPU to enable coverage for reduce-scatter/all-gather

Fixed by https://github.com/pytorch/xla/pull/7132

Extracted subarrray's device is 'lazy' instead of 'xla' when using ellipsis extraction with XLA_DISABLE_FUNCTIONALIZATION=1

Replacing the ellipsis with ":" helps, but make the code less general. ``` import torch import torch_xla.core.xla_model as xm device = xm.xla_device() class TestRegisterBuffCls(torch.nn.Module): def __init__(self, size): super().__init__() self.register_buffer('buffer', torch.zeros((size,100),...