FixMatch-pytorch icon indicating copy to clipboard operation
FixMatch-pytorch copied to clipboard

num_labeled in DistributedDataParallel

Open LiheYoung opened this issue 4 years ago • 9 comments

When using DistributedDataParallel, if N labeled training images and K GPUs are used, should we set num_labeled = N / K instead of N? since np.random.shuffle(idx) generate different idxs in different threads.

LiheYoung avatar Mar 27 '20 09:03 LiheYoung

I think @LiheYoung is correct -- w/ DistributedDataParallel you launch N copies of the program. If you don't set the seed, then np.random will sample the dataset differently, and you end up w/ roughly 4 * N more samples than you're expecting.

This should be easy to test by running w/o a seed and w/ a seed -- if I'm right, using a seed will reduce the performance substantially. Running this experiment now, will report results here.

bkj avatar Sep 17 '20 23:09 bkj

In my experiment, performance w/o the seed is substantially better than w/ a seed. I only ran once, so perhaps this is random variation, but I'm guessing this is due to the issue @LiheYoung and I pointed out above.

Screen Shot 2020-09-17 at 8 44 57 PM

Red line is w/o seed, blue w/ seed.

Edit: This is for

python -m torch.distributed.launch --nproc_per_node 4 train.py
    --dataset        cifar10           
    --num-labeled    250               
    --arch           wideresnet        
    --batch-size     16                
    --lr             0.03

bkj avatar Sep 18 '20 00:09 bkj

I think @LiheYoung is correct -- w/ DistributedDataParallel you launch N copies of the program. If you don't set the seed, then np.random will sample the dataset differently, and you end up w/ roughly 4 * N more samples than you're expecting. This should be easy to test by running w/o a seed and w/ a seed -- if I'm right, using a seed will reduce the performance substantially. Running this experiment now, will report results here.

That's right. I need to tell you not to use a seed.

Sorry, a little confused. Should we set the seed?

I think @LiheYoung is correct. With K gpus, the actual number of labeled data is K*N, rather than N. So we should set labeled number to N/K, or set the same seed for all gpus?

chongruo avatar Oct 12 '20 08:10 chongruo

https://github.com/kekmodel/FixMatch-pytorch/blob/10db592088432a0d18f95d74a2e3f6a2dbc25518/dataset/cifar.py#L102-L106

For each GPU, the corresponding process will create a CIFAR dataset. Since we don't set the fixed seed, the idx is shuffled (line 104) in different ways on different GPUs, which results in having more labeled samples.

chongruo avatar Oct 19 '20 16:10 chongruo

I think @LiheYoung is correct -- w/ DistributedDataParallel you launch N copies of the program. If you don't set the seed, then np.random will sample the dataset differently, and you end up w/ roughly 4 * N more samples than you're expecting. This should be easy to test by running w/o a seed and w/ a seed -- if I'm right, using a seed will reduce the performance substantially. Running this experiment now, will report results here.

That's right. I need to tell you not to use a seed.

Sorry, a little confused. Should we set the seed?

I think @LiheYoung is correct. With K gpus, the actual number of labeled data is K*N, rather than N. So we should set labeled number to N/K, or set the same seed for all gpus?

That's right. If you print the idxs, you will find different idxs are generated K times, so actually the labeled data are K times than you set. So we should set labeled number to N/K, or set the same seed for all gpus, period.

zhifanwu avatar Nov 21 '20 07:11 zhifanwu

I think @LiheYoung is correct -- w/ DistributedDataParallel you launch N copies of the program. If you don't set the seed, then np.random will sample the dataset differently, and you end up w/ roughly 4 * N more samples than you're expecting. This should be easy to test by running w/o a seed and w/ a seed -- if I'm right, using a seed will reduce the performance substantially. Running this experiment now, will report results here.

That's right. I need to tell you not to use a seed.

I think there is a bug in the implementation of DDP, see above discussions, please.

zhifanwu avatar Nov 21 '20 07:11 zhifanwu

Will it be solved by using a seed?

kekmodel avatar Dec 11 '20 02:12 kekmodel

https://github.com/kekmodel/FixMatch-pytorch/blob/10db592088432a0d18f95d74a2e3f6a2dbc25518/dataset/cifar.py#L102-L106

For each GPU, the corresponding process will create a CIFAR dataset. Since we don't set the fixed seed, the idx is shuffled (line 104) in different ways on different GPUs, which results in having more labeled samples.

Is it necessary to use 4GPUs to reproduce the results with 40 labels?

moucheng2017 avatar Apr 28 '22 14:04 moucheng2017

In my experiment, performance w/o the seed is substantially better than w/ a seed. I only ran once, so perhaps this is random variation, but I'm guessing this is due to the issue @LiheYoung and I pointed out above.

Screen Shot 2020-09-17 at 8 44 57 PM

Red line is w/o seed, blue w/ seed.

Edit: This is for

python -m torch.distributed.launch --nproc_per_node 4 train.py
    --dataset        cifar10           
    --num-labeled    250               
    --arch           wideresnet        
    --batch-size     16                
    --lr             0.03

Hey! I am wondering if you could reproduce the results with 1 GPU?

moucheng2017 avatar Apr 28 '22 14:04 moucheng2017