FixMatch-pytorch num_labeled in DistributedDataParallel

When using DistributedDataParallel, if N labeled training images and K GPUs are used, should we set num_labeled = N / K instead of N? since np.random.shuffle(idx) generate different idxs in different threads.

Mar 27 '20 09:03 LiheYoung

I think @LiheYoung is correct -- w/ DistributedDataParallel you launch N copies of the program. If you don't set the seed, then np.random will sample the dataset differently, and you end up w/ roughly 4 * N more samples than you're expecting.

This should be easy to test by running w/o a seed and w/ a seed -- if I'm right, using a seed will reduce the performance substantially. Running this experiment now, will report results here.

Sep 17 '20 23:09 bkj

In my experiment, performance w/o the seed is substantially better than w/ a seed. I only ran once, so perhaps this is random variation, but I'm guessing this is due to the issue @LiheYoung and I pointed out above.

Red line is w/o seed, blue w/ seed.

Edit: This is for

python -m torch.distributed.launch --nproc_per_node 4 train.py
    --dataset        cifar10           
    --num-labeled    250               
    --arch           wideresnet        
    --batch-size     16                
    --lr             0.03

Sep 18 '20 00:09 bkj

I think @LiheYoung is correct -- w/ DistributedDataParallel you launch N copies of the program. If you don't set the seed, then np.random will sample the dataset differently, and you end up w/ roughly 4 * N more samples than you're expecting. This should be easy to test by running w/o a seed and w/ a seed -- if I'm right, using a seed will reduce the performance substantially. Running this experiment now, will report results here.

That's right. I need to tell you not to use a seed.

Sorry, a little confused. Should we set the seed?

I think @LiheYoung is correct. With K gpus, the actual number of labeled data is K*N, rather than N. So we should set labeled number to N/K, or set the same seed for all gpus?

Oct 12 '20 08:10 chongruo

https://github.com/kekmodel/FixMatch-pytorch/blob/10db592088432a0d18f95d74a2e3f6a2dbc25518/dataset/cifar.py#L102-L106

For each GPU, the corresponding process will create a CIFAR dataset. Since we don't set the fixed seed, the idx is shuffled (line 104) in different ways on different GPUs, which results in having more labeled samples.

Oct 19 '20 16:10 chongruo

I think @LiheYoung is correct -- w/ DistributedDataParallel you launch N copies of the program. If you don't set the seed, then np.random will sample the dataset differently, and you end up w/ roughly 4 * N more samples than you're expecting. This should be easy to test by running w/o a seed and w/ a seed -- if I'm right, using a seed will reduce the performance substantially. Running this experiment now, will report results here.

That's right. I need to tell you not to use a seed.

Sorry, a little confused. Should we set the seed?

I think @LiheYoung is correct. With K gpus, the actual number of labeled data is K*N, rather than N. So we should set labeled number to N/K, or set the same seed for all gpus?

That's right. If you print the idxs, you will find different idxs are generated K times, so actually the labeled data are K times than you set. So we should set labeled number to N/K, or set the same seed for all gpus, period.

Nov 21 '20 07:11 zhifanwu

I think @LiheYoung is correct -- w/ DistributedDataParallel you launch N copies of the program. If you don't set the seed, then np.random will sample the dataset differently, and you end up w/ roughly 4 * N more samples than you're expecting. This should be easy to test by running w/o a seed and w/ a seed -- if I'm right, using a seed will reduce the performance substantially. Running this experiment now, will report results here.

That's right. I need to tell you not to use a seed.

I think there is a bug in the implementation of DDP, see above discussions, please.

Nov 21 '20 07:11 zhifanwu

Will it be solved by using a seed?

Dec 11 '20 02:12 kekmodel

https://github.com/kekmodel/FixMatch-pytorch/blob/10db592088432a0d18f95d74a2e3f6a2dbc25518/dataset/cifar.py#L102-L106

For each GPU, the corresponding process will create a CIFAR dataset. Since we don't set the fixed seed, the idx is shuffled (line 104) in different ways on different GPUs, which results in having more labeled samples.

Is it necessary to use 4GPUs to reproduce the results with 40 labels?

Apr 28 '22 14:04 moucheng2017

In my experiment, performance w/o the seed is substantially better than w/ a seed. I only ran once, so perhaps this is random variation, but I'm guessing this is due to the issue @LiheYoung and I pointed out above.

Red line is w/o seed, blue w/ seed.

Edit: This is for
python -m torch.distributed.launch --nproc_per_node 4 train.py
    --dataset        cifar10           
    --num-labeled    250               
    --arch           wideresnet        
    --batch-size     16                
    --lr             0.03

Hey! I am wondering if you could reproduce the results with 1 GPU?

Apr 28 '22 14:04 moucheng2017

FixMatch-pytorch FixMatch-pytorch copied to clipboard

num_labeled in DistributedDataParallel

FixMatch-pytorch
FixMatch-pytorch copied to clipboard