FixMatch-pytorch
FixMatch-pytorch copied to clipboard
num_labeled in DistributedDataParallel
When using DistributedDataParallel, if N labeled training images and K GPUs are used, should we set num_labeled = N / K instead of N? since np.random.shuffle(idx) generate different idxs in different threads.
I think @LiheYoung is correct -- w/ DistributedDataParallel
you launch N copies of the program. If you don't set the seed, then np.random
will sample the dataset differently, and you end up w/ roughly 4 * N more samples than you're expecting.
This should be easy to test by running w/o a seed and w/ a seed -- if I'm right, using a seed will reduce the performance substantially. Running this experiment now, will report results here.
In my experiment, performance w/o the seed is substantially better than w/ a seed. I only ran once, so perhaps this is random variation, but I'm guessing this is due to the issue @LiheYoung and I pointed out above.

Red line is w/o seed, blue w/ seed.
Edit: This is for
python -m torch.distributed.launch --nproc_per_node 4 train.py
--dataset cifar10
--num-labeled 250
--arch wideresnet
--batch-size 16
--lr 0.03
I think @LiheYoung is correct -- w/
DistributedDataParallel
you launch N copies of the program. If you don't set the seed, thennp.random
will sample the dataset differently, and you end up w/ roughly 4 * N more samples than you're expecting. This should be easy to test by running w/o a seed and w/ a seed -- if I'm right, using a seed will reduce the performance substantially. Running this experiment now, will report results here.That's right. I need to tell you not to use a seed.
Sorry, a little confused. Should we set the seed?
I think @LiheYoung is correct. With K gpus, the actual number of labeled data is K*N, rather than N. So we should set labeled number to N/K, or set the same seed for all gpus?
https://github.com/kekmodel/FixMatch-pytorch/blob/10db592088432a0d18f95d74a2e3f6a2dbc25518/dataset/cifar.py#L102-L106
For each GPU, the corresponding process will create a CIFAR dataset. Since we don't set the fixed seed, the idx is shuffled (line 104) in different ways on different GPUs, which results in having more labeled samples.
I think @LiheYoung is correct -- w/
DistributedDataParallel
you launch N copies of the program. If you don't set the seed, thennp.random
will sample the dataset differently, and you end up w/ roughly 4 * N more samples than you're expecting. This should be easy to test by running w/o a seed and w/ a seed -- if I'm right, using a seed will reduce the performance substantially. Running this experiment now, will report results here.That's right. I need to tell you not to use a seed.
Sorry, a little confused. Should we set the seed?
I think @LiheYoung is correct. With K gpus, the actual number of labeled data is K*N, rather than N. So we should set labeled number to N/K, or set the same seed for all gpus?
That's right. If you print the idxs, you will find different idxs are generated K times, so actually the labeled data are K times than you set. So we should set labeled number to N/K, or set the same seed for all gpus, period.
I think @LiheYoung is correct -- w/
DistributedDataParallel
you launch N copies of the program. If you don't set the seed, thennp.random
will sample the dataset differently, and you end up w/ roughly 4 * N more samples than you're expecting. This should be easy to test by running w/o a seed and w/ a seed -- if I'm right, using a seed will reduce the performance substantially. Running this experiment now, will report results here.That's right. I need to tell you not to use a seed.
I think there is a bug in the implementation of DDP, see above discussions, please.
Will it be solved by using a seed?
https://github.com/kekmodel/FixMatch-pytorch/blob/10db592088432a0d18f95d74a2e3f6a2dbc25518/dataset/cifar.py#L102-L106
For each GPU, the corresponding process will create a CIFAR dataset. Since we don't set the fixed seed, the idx is shuffled (line 104) in different ways on different GPUs, which results in having more labeled samples.
Is it necessary to use 4GPUs to reproduce the results with 40 labels?
In my experiment, performance w/o the seed is substantially better than w/ a seed. I only ran once, so perhaps this is random variation, but I'm guessing this is due to the issue @LiheYoung and I pointed out above.
![]()
Red line is w/o seed, blue w/ seed.
Edit: This is for
python -m torch.distributed.launch --nproc_per_node 4 train.py --dataset cifar10 --num-labeled 250 --arch wideresnet --batch-size 16 --lr 0.03
Hey! I am wondering if you could reproduce the results with 1 GPU?