sparse-to-dense.pytorch
sparse-to-dense.pytorch copied to clipboard
Same Training Data on Resume
When creating the train-loader, every worker is initialized with its ID as a random seed.
if not args.evaluate:
train_loader = torch.utils.data.DataLoader(
train_dataset, batch_size=args.batch_size, shuffle=True,
num_workers=args.workers, pin_memory=True, sampler=None,
worker_init_fn=lambda work_id:np.random.seed(work_id))
When I decide to train the model for more than 15 epochs (i.e. with the --resume and --epochs arguments), the work_ids will be the same as in the first training, leading to essentially training the network with the same data twice.
I would suggest to modify above code e.g. to:
worker_init_fn=lambda work_id:np.random.seed(work_id+args.epochs))
to generate new training data when resuming the training.
That's indeed a problem. However, work_id+args.epochs is probably not the ideal solution because of possible ID overlapping. For instance, work_id=20 at epochs=15 overlaps with work_id=15 at epochs=20 when resuming.
You are right. What about
worker_init_fn=lambda work_id:np.random.seed(work_id+args.workers*epoch))
where epoch is incremented on every epoch. This would, of course, require to rebuild the train_loader on each epoch unless there is a way to update the seeds during runtime with something like:
def set_epoch(self, epoch):
self.init_worker_fn = lambda work_id:np.random.seed(work_id + num_workers*epoch)
in dataloader.py which probably does not work, at least in this simple form.
Preferably we don't have to rebuild the data loaders at every epoch. Additionally, this workaround might still suffer from seed overlapping if one resumes with a different number of workers.