BELLE icon indicating copy to clipboard operation
BELLE copied to clipboard

synchronize before creating output_dir

Open zmzhang2000 opened this issue 1 year ago • 0 comments

In multiprocessing, log file may be created before other processes checking if len(os.listdir(training_args.output_dir)) > 0, and thus a ValueError will be raised.

Synchronizing between processes using torch.distributed.barrier() tackles this problem.

zmzhang2000 avatar Nov 08 '23 07:11 zmzhang2000