About epoch settings

Open Tonsty opened this issue 1 month ago • 1 comments

Here, in order to solve the spawn problem, epoch is added with self.distributed_rank

https://github.com/facebookresearch/vggt/blob/44b3afbd1869d8bde4894dd8ea1e293112dd5eba/training/trainer.py#L382

Here, sampler.set_epoch() will use the shifted epoch, so that different GPUs will sample at different epoches.

https://github.com/facebookresearch/vggt/blob/44b3afbd1869d8bde4894dd8ea1e293112dd5eba/training/data/dynamic_dataloader.py#L67-L78

Then, Dataloader will use the shifted epoch as parameter for get_worker_init_fn

https://github.com/facebookresearch/vggt/blob/44b3afbd1869d8bde4894dd8ea1e293112dd5eba/training/data/dynamic_dataloader.py#L78-L91

Here, worker_seed is calculated by the shifted epoch and other factors, among which we found distributed_rank is already there.

https://github.com/facebookresearch/vggt/blob/44b3afbd1869d8bde4894dd8ea1e293112dd5eba/training/data/worker_fn.py#L84-L99

So it seems that using a shifted epoch is unnecessary.

Anyone can give a clear explanation?

Nov 12 '25 09:11 Tonsty

Hi, if I remember correctly, set_epoch() was designed for other experimental functions. It should not affect the training. Please let me know if you think there is a bug.

Nov 28 '25 10:11 jytime