MIC code problems with parallel training

code problems with parallel training

Open woldier opened this issue 4 months ago • 0 comments

https://github.com/lhoyer/MIC/blob/2f932a98b5dd9f598aaeb32411863ceea0809314/seg/mmseg/apis/train.py#L71 In case of parallel training, then DCAS.clas will be wrapped by DistributedDataParallelWrapper, where the model will be wrapped by MMDistributedDataParallel, by reading the MMDistributedDataParallel source code, it can be seen that the parallel model parameter synchronization methods supported by it are forward, train_step, val_step, and

https://github.com/lhoyer/MIC/blob/2f932a98b5dd9f598aaeb32411863ceea0809314/seg/mmseg/models/uda/dacs.py#L342 In DCAS, model forward_train is called, which leads to the fact that the model parameter weights of different nodes are not automatically synchronized during parallel training.

May I ask the author if the situation I'm worried about exists?

@lhoyer

Oct 03 '24 01:10 woldier

MIC MIC copied to clipboard

code problems with parallel training

MIC
MIC copied to clipboard