sparse-to-dense.pytorch Same Training Data on Resume

When creating the train-loader, every worker is initialized with its ID as a random seed.

    if not args.evaluate:
        train_loader = torch.utils.data.DataLoader(
            train_dataset, batch_size=args.batch_size, shuffle=True,
            num_workers=args.workers, pin_memory=True, sampler=None,
            worker_init_fn=lambda work_id:np.random.seed(work_id))

When I decide to train the model for more than 15 epochs (i.e. with the --resume and --epochs arguments), the work_ids will be the same as in the first training, leading to essentially training the network with the same data twice.

I would suggest to modify above code e.g. to:

worker_init_fn=lambda work_id:np.random.seed(work_id+args.epochs))

to generate new training data when resuming the training.

Nov 06 '18 16:11 maaft

That's indeed a problem. However, work_id+args.epochs is probably not the ideal solution because of possible ID overlapping. For instance, work_id=20 at epochs=15 overlaps with work_id=15 at epochs=20 when resuming.

Nov 06 '18 18:11 fangchangma

You are right. What about

worker_init_fn=lambda work_id:np.random.seed(work_id+args.workers*epoch))

where epoch is incremented on every epoch. This would, of course, require to rebuild the train_loader on each epoch unless there is a way to update the seeds during runtime with something like:

    def set_epoch(self, epoch):
        self.init_worker_fn = lambda work_id:np.random.seed(work_id + num_workers*epoch)

in dataloader.py which probably does not work, at least in this simple form.

Nov 07 '18 12:11 maaft

Preferably we don't have to rebuild the data loaders at every epoch. Additionally, this workaround might still suffer from seed overlapping if one resumes with a different number of workers.

Nov 10 '18 00:11 fangchangma

sparse-to-dense.pytorch sparse-to-dense.pytorch copied to clipboard

Same Training Data on Resume

sparse-to-dense.pytorch
sparse-to-dense.pytorch copied to clipboard