Danielle Pintz
Danielle Pintz
Summary: Since this will be used mostly for local debugging let's only print this on rank 0 Differential Revision: D45975140
Summary: We want TSS to automatically save a checkpoint at the end of training. Differential Revision: D45920596
Differential Revision: D49399892
Summary: Attempt to fix torchsnapshot CI: https://github.com/pytorch/torchsnapshot/actions/runs/5766115388/job/15694536972 ``` tests/test_uvm_tensor.py::test_uvm_tensor FAILED [100%] =================================== FAILURES =================================== _______________________________ test_uvm_tensor ________________________________ pytest.mark.cpu_and_gpu def test_uvm_tensor() -> None: if torch.cuda.is_available() and _UVM_TENSOR_AVAILABLE: uvm_tensor = torch.rand( (64,...