awesome-semantic-segmentation-pytorch
awesome-semantic-segmentation-pytorch copied to clipboard
save_checkpoint may cause fault when not skip_val
There is a tiny bug in the function validation() within train.py. save_checkpoint() should just be implemented when save_to_disk is True.
True. This is a bug when using distributed computing. Due to simultaneous writes, the checkpoint file is getting corrupted. The fix is as you suggested which saves the checkpoint only for rank = 0.