Contrastive-Adaptation-Network-for-Unsupervised-Domain-Adaptation Does FC domain BN introduce biases when inferencing?

Thanks for the excellent work and the highly-organized code. I'm a little confused about the BN behavior of FC for CAN. According to what i observed, during the training, the target domain's FC BN was not updated. Only the source domain's FC BN layer was updated through the CE loss. When testing, however, the target domain's FC BN was used. Would this harm the performance of the network? Since the target domain's FC BN haven't been trained and may introduce biases. Thank you!

May 12 '20 05:05 Ledzy

Maybe u can the the orignal papper:

all the network parameters are shared between the source domain and target domain data other than those of the batch normalization layers which are domain-specific

May 25 '20 03:05 WDYIE

Maybe u can the the orignal papper:

all the network parameters are shared between the source domain and target domain data other than those of the batch normalization layers which are domain-specific

Yeah i read it. The domain-specific BNs make sense to me, as it can reduce different domain's feature discrepancy. What i'm confused about is the behavior of last fully connected layer's BN1. when inferencing, the last fc's target domain's BN is used, which is never trained. I suspect it may introduce bias. I would update my experiment on this issue later.

May 26 '20 17:05 Ledzy

Hi, I don't think I have added a domain-specific BN layer to the last FC in my implementation. Could you point it to me if possible?

May 27 '20 01:05 kgl-prml

Maybe u can the the orignal papper:

all the network parameters are shared between the source domain and target domain data other than those of the batch normalization layers which are domain-specific

Yeah i read it. The domain-specific BNs make sense to me, as it can reduce different domain's feature discrepancy. What i'm confused about is the behavior of last fully connected layer's BN1. when inferencing, the last fc's target domain's BN is used, which is never trained. I suspect it may introduce bias. I would update my experiment on this issue later.

Perhaps there is a misunderstanding, the fact is that there are multiple BN layers in the network, each layer has BN0 and BN1. Use BN0 when training the source domain and BN1 when training the target domain.

                self.net.module.set_bn_domain(self.bn_domain_map[self.source_name])#switch to target source BN
                feats_source = self.net(source_cls_concat)
                self.net.module.set_bn_domain(self.bn_domain_map[self.target_name]) #switch to target domain BN
                feats_target = self.net(target_cls_concat)

                # prepare the features
                feats_toalign_S = self.prepare_feats(feats_source)
                feats_toalign_T = self.prepare_feats(feats_target)                 

                cdd_loss = self.cdd.forward(feats_toalign_S, feats_toalign_T, 
                               source_nums_cls, target_nums_cls)[self.discrepancy_key]

                cdd_loss *= self.opt.CDD.LOSS_WEIGHT
                cdd_loss.backward()

May 27 '20 04:05 WDYIE

Hi, I don't think I have added a domain-specific BN layer to the last FC in my implementation. Could you point it to me if possible?

Thanks for your reply. When defining FC of CAN model, it uses domain-specific BN (in class DANet)

self.FC[str(k)] = FC_BN_ReLU_Domain(in_dim, out_dim, num_domains_bn)

In update_network of CAN_solver.py, i found that when calculating ce_loss, it uses source domain's BN, i.e. the source domain's FC's BN was updated.

self.net.module.set_bn_domain(self.bn_domain_map[self.source_name])
source_preds = self.net(source_data)['logits']

# compute the cross-entropy loss
ce_loss = self.CELoss(source_preds, source_gt)
ce_loss.backward()

When testing, however, the net's bn domain was set to be target domain (TEST.DOMAIN), i.e. the FC's target domain's BN was used, which haven't been trained. (in test.py)

if cfg.TEST.DOMAIN in bn_domain_map:
    domain_id = bn_domain_map[cfg.TEST.DOMAIN]
else:
    domain_id = 0

with torch.no_grad():
    net.module.set_bn_domain(domain_id)
    for sample in iter(dataloader): 
        res['path'] += sample['Path']

        if cfg.DATA_TRANSFORM.WITH_FIVE_CROP:
            n, ncrop, c, h, w = sample['Img'].size()
            sample['Img'] = sample['Img'].view(-1, c, h, w)
            img = to_cuda(sample['Img'])
            probs = net(img)['probs']
            probs = probs.view(n, ncrop, -1).mean(dim=1)
        else:
            img = to_cuda(sample['Img'])
            probs = net(img)['probs']

May 27 '20 07:05 Ledzy

Maybe u can the the orignal papper:

all the network parameters are shared between the source domain and target domain data other than those of the batch normalization layers which are domain-specific

Yeah i read it. The domain-specific BNs make sense to me, as it can reduce different domain's feature discrepancy. What i'm confused about is the behavior of last fully connected layer's BN1. when inferencing, the last fc's target domain's BN is used, which is never trained. I suspect it may introduce bias. I would update my experiment on this issue later.

Perhaps there is a misunderstanding, the fact is that there are multiple BN layers in the network, each layer has BN0 and BN1. Use BN0 when training the source domain and BN1 when training the target domain.
                self.net.module.set_bn_domain(self.bn_domain_map[self.source_name])#switch to target source BN
                feats_source = self.net(source_cls_concat)
                self.net.module.set_bn_domain(self.bn_domain_map[self.target_name]) #switch to target domain BN
                feats_target = self.net(target_cls_concat)

                # prepare the features
                feats_toalign_S = self.prepare_feats(feats_source)
                feats_toalign_T = self.prepare_feats(feats_target)                 

                cdd_loss = self.cdd.forward(feats_toalign_S, feats_toalign_T, 
                               source_nums_cls, target_nums_cls)[self.discrepancy_key]

                cdd_loss *= self.opt.CDD.LOSS_WEIGHT
                cdd_loss.backward()

Thank you for your reply. I agree with what you said. There're relative BNs for source & target domain, while both domain shares the weight of conv layer. My problem is that, the last fully connected layers (not feature extractor) also contain domain-specific BN (if i didn't mis-interpret the code). The target domain's BN is not updated during training while it is used for inferencing.

May 27 '20 07:05 Ledzy

@Ledzy I think you misunderstood this part. First, both the source domain BN and the target domain BN are updated during training. The CE loss is imposed on the source data, thus you should switch the domain BN to the source mode. While for the alignment with CDD, I just switch the domain BN to the target mode before I obtain the target feature. You may refer to the Line 142 and Line 253 in the solver/can_solver.py. Second, I just use one layer FC. Thus in my provided configuration, actually the network don't have any FC BN layer. But you can choose to use mutlilayer-FC, which has the FC domain BN layer in this case. And it will work correctly as I just mentioned. Hope this may help you.

Jun 08 '20 21:06 kgl-prml

Contrastive-Adaptation-Network-for-Unsupervised-Domain-Adaptation Contrastive-Adaptation-Network-for-Unsupervised-Domain-Adaptation copied to clipboard

Does FC domain BN introduce biases when inferencing?

Contrastive-Adaptation-Network-for-Unsupervised-Domain-Adaptation
Contrastive-Adaptation-Network-for-Unsupervised-Domain-Adaptation copied to clipboard