About epoch settings
Here, in order to solve the spawn problem, epoch is added with self.distributed_rank
https://github.com/facebookresearch/vggt/blob/44b3afbd1869d8bde4894dd8ea1e293112dd5eba/training/trainer.py#L382
Here, sampler.set_epoch() will use the shifted epoch, so that different GPUs will sample at different epoches.
https://github.com/facebookresearch/vggt/blob/44b3afbd1869d8bde4894dd8ea1e293112dd5eba/training/data/dynamic_dataloader.py#L67-L78
Then, Dataloader will use the shifted epoch as parameter for get_worker_init_fn
https://github.com/facebookresearch/vggt/blob/44b3afbd1869d8bde4894dd8ea1e293112dd5eba/training/data/dynamic_dataloader.py#L78-L91
Here, worker_seed is calculated by the shifted epoch and other factors, among which we found distributed_rank is already there.
https://github.com/facebookresearch/vggt/blob/44b3afbd1869d8bde4894dd8ea1e293112dd5eba/training/data/worker_fn.py#L84-L99
So it seems that using a shifted epoch is unnecessary.
Anyone can give a clear explanation?
Hi, if I remember correctly, set_epoch() was designed for other experimental functions. It should not affect the training. Please let me know if you think there is a bug.