awesome-semantic-segmentation-pytorch icon indicating copy to clipboard operation
awesome-semantic-segmentation-pytorch copied to clipboard

save_checkpoint may cause fault when not skip_val

Open liubo0902 opened this issue 5 years ago • 1 comments

There is a tiny bug in the function validation() within train.py. save_checkpoint() should just be implemented when save_to_disk is True.

liubo0902 avatar Sep 05 '19 06:09 liubo0902

True. This is a bug when using distributed computing. Due to simultaneous writes, the checkpoint file is getting corrupted. The fix is as you suggested which saves the checkpoint only for rank = 0.

bijjuair avatar May 15 '20 09:05 bijjuair