MIC
MIC copied to clipboard
code problems with parallel training
https://github.com/lhoyer/MIC/blob/2f932a98b5dd9f598aaeb32411863ceea0809314/seg/mmseg/apis/train.py#L71
In case of parallel training, then DCAS.clas will be wrapped by DistributedDataParallelWrapper
, where the model will be wrapped by MMDistributedDataParallel
, by reading the MMDistributedDataParallel
source code, it can be seen that the parallel model parameter synchronization methods supported by it are forward
, train_step
, val_step
, and
https://github.com/lhoyer/MIC/blob/2f932a98b5dd9f598aaeb32411863ceea0809314/seg/mmseg/models/uda/dacs.py#L342
In DCAS, model forward_train
is called, which leads to the fact that the model parameter weights of different nodes are not automatically synchronized during parallel training.
May I ask the author if the situation I'm worried about exists?
@lhoyer