Jianfeng Wang issues

Repositories
Issues
Comments

Results 5 issues of


                                            Jianfeng Wang

avoid empty clusters

Is there any interface to avoid empty clusters?

no_sync equivalent used for gradient accumulation

In gradient accumulation, we do not need to gather the gradient for the first N - 1 iterations. If it is pytorch/DDP, we can use the no_sync() as follows. In...

training time is too long

By default, 300 epochs are used for the training. On a machine with 4 P100, it needs about 21 days. Is it normal? How is the training time with V100...

fairness issues for comparision

I find the proposed training strategy is 1) train the backbone with the labels and the contrastive loss, 2) finetune the last linear layer. The baseline approach is train the...

Question: how to continue the training with more or fewer GPUs

If there are N GPUs, the snapshot will be N files for optimizer states. Each file corresponds to 1 GPU. (let me know if the understanding is not correct). Then,...