ContrastiveSeg icon indicating copy to clipboard operation
ContrastiveSeg copied to clipboard

Dataloader distributed or not

Open zhd2rng opened this issue 1 year ago • 0 comments

Hi, I was checking the logging file, i.e., hrnet_w48_contrast_lr1x_hrnet_contrast_t0.1.log. The epoch and iteration seem to be computed as the training is for a single gpu, while it is a 4 GPU job.

Basically, for 4 GPU job and bs=8 per GPU, one epoch of Cityscapes with 2975 training set, should have 93 iterations if the dataloader is distributed across all GPUs. But in the logging, one epoch has 4 times more iterations. This leads to a question if the dataloader is not distributed over multiple gpus, and how the iteration/epoch is counted. This would affect how the learning rate and warm up iterations are configured.

Thanks.

zhd2rng avatar Apr 03 '23 22:04 zhd2rng