traincheck-team
Results
12
comments of
traincheck-team
@loadams Hi Logan, I apologize for the late reply. I’ve reviewed the 9 unit test failures in the recent workflow run: https://github.com/deepspeedai/DeepSpeed/actions/runs/13205140637/job/36866442471. My understanding is that these failures are caused...
I found the following code segment that seems to be linked to the reported behavior. The root cause is the non-atomic snapshot write in `save()` combined with the one-way rename...