pytorch_DANN Result has huge difference

Hi, The result of my code has huge difference with yours. At the epoch 99, my result is:

Source Accuracy: 9839/10000 (98.0000%)
Target Accuracy: 5958/9001 (66.0000%)
Domain Accuracy: 10646/19001 (56.0000%)

May I ask how much epoch did you run to get that result please? Thank you in advance.

Jul 23 '18 13:07 tengerye

@tengerye I run this code several times and find that the results are not stable. The best result I get is about 72% and the worst one is about 67%. I think the instability is caused by the property of adversarial training which is similar to GAN.

Jul 24 '18 03:07 CuthbertCai

I did experiment that trained the model only from source domain and predict on the target domain.

Source Accuracy: 9513/10000 (95.0000%)
Target Accuracy: 2161/9001 (24.0000%)
Domain Accuracy: 7278/19001 (38.0000%)

which could prove the method works but the result has obvious difference though. Thank you very much.

Jul 25 '18 01:07 tengerye

This time, I trained on target domain.

Source Accuracy: 9159/10000 (91.0000%)
Target Accuracy: 7692/9001 (85.0000%)
Domain Accuracy: 9962/19001 (52.0000%)

The interesting thing is the model trained on target domain has higher accuracy on source domain.

Jul 25 '18 06:07 tengerye

@tengerye After merging your request, maybe I found the reason why our results are so different. In function get_test_loader(), I used transforms.CenterCrop() while you used transforms.RandomCrop() for MNIST_M. Thus, I tried transforms.CenterCrop() for the newest version, then the target accuracy of 'dann' model can reach 70+% after 50 epochs and of 'target' mode can reach 94% after 20 epochs. In fact, I'm not sure which function should be the correct one.

Jul 27 '18 12:07 CuthbertCai

@CuthbertCai You are right. Just for the record, the accuracy fluctuates above 0.7 after 50 epochs of 'dann'.

The result of 100 epochs is:

Source Accuracy: 9839/10000 (98.0000%)
Target Accuracy: 7007/9001 (77.0000%)
Domain Accuracy: 9848/19001 (51.0000%)

; for the 'target', the result of 20 epochs is:

Source Accuracy: 9570/10000 (95.0000%)
Target Accuracy: 8576/9001 (95.0000%)
Domain Accuracy: 8171/19001 (43.0000%)

. They are performed with transforms.CenterCrop() on the test.

Jul 31 '18 06:07 tengerye

I am updating the result which is produced by transforms.CenterCrop() instead of transforms.RandomCrop().

Train on source only:

Source Accuracy: 9834/10000 (98.0000%)
Target Accuracy: 3366/9001 (37.0000%)
Domain Accuracy: 7542/19001 (39.0000%)

Train on target only:

Source Accuracy: 7247/10000 (72.0000%)
Target Accuracy: 6669/9001 (74.0000%)
Domain Accuracy: 11612/19001 (61.0000%)

Train with dann:

Source Accuracy: 9827/10000 (98.0000%)
Target Accuracy: 6828/9001 (75.0000%)
Domain Accuracy: 10147/19001 (53.0000%)

Aug 17 '18 01:08 tengerye

@tengerye @CuthbertCai Hi, I plot my results (Target Acc:0.78, Domain Acc: 0.54 on 100 epoch) but distributions are not mixed. you guys have no issue about this ?

Aug 24 '18 09:08 omg777

@omg777 I tested the original version, ploted the embeddings and founded that the distributions are not mixed. For the newest version, I just recorded the accuracy and did not check what the embeddings looks like. Also, I ploted the embeddings of DAN and founeded that the distributions are mixed whose accuracy is lower than DANN. Thus, I'm not sure whether the embedding plot can be a essential metric for domain adaptation.

Aug 24 '18 10:08 CuthbertCai

@CuthbertCai Thanks for your quick reponse. original version means caffe version of this paper ? I found DANN implemented with [tensorflow. [https://github.com/pumpikano/tf-dann] In this repo, they draw distributions looks like mixed well, even though accuracy is lower than yours. I satisfied the results in this repo, but I want to check that distributions are really mixed enough to deceive domain_discriminator.

Here is my results and plot.

Source Accuracy: 9889/10000 (98.0000%)
Target Accuracy: 8990/10000 (89.0000%)
Domain Accuracy: 10672/20000 (53.0000%)

Aug 24 '18 16:08 omg777

@omg777 The original version means the first version I pushed, instead of the caffe implemented version. In fact, I don't know why the embedding are not mixed well while the accuracy is high. I guess there exists some problems during t-sne process, but I'm not sure of it. If you founded the reason, please tell me. Thanks a lot.

Aug 24 '18 17:08 CuthbertCai

Hi, sorry about late reply @omg777 @CuthbertCai , I have awarded of that but concluded it was because of small iterations of training. I will have a look at it. It is unlikely that the t-sne is wrong, since I tested so many times. But I will have a look at that as well just in case.

Aug 27 '18 03:08 tengerye

@tengerye Hi, you mean that this repo works well include t-sne ? I checked t-sne codes compared to tensorfow repo. but I haven't find differences. I will test with bigger epochs as you mentioned and share the results in here. ++ When I changed batch size on mnist and mnistm loader, t-sne plot according to the batch size, how can I draw always 500 samples like tf repo? thanks!

Aug 27 '18 04:08 omg777

@omg777 I wrote the part of t-sne after reading the corresponding codes of TF, so surely similar. :laughing:

As to the flexible size of samples to t-sne, I will adjust that and update soon.

Aug 27 '18 08:08 tengerye

@omg777 Done. You can find it on my branch. I have sent a pull request to @CuthbertCai . By the way, my experiments are still running. Let's see if the problem is of small iterations.

Aug 28 '18 02:08 tengerye

@tengerye @omg777 Here is my embedding plot after 100 epochs, and the embeddings seem not mixed well.

Aug 28 '18 05:08 CuthbertCai

@tengerye @CuthbertCai
I have some questions about code.

What is this code ? input1, label1 = input1[0:size, :, :, :], label1[0:size]

In test code, there are 2 for loop. Is there any reasons to separate them? In train code, there are combined one as enumerate(zip(... , ...))

Thanks!

Aug 28 '18 16:08 omg777

embeddings_dann This is of mixed examples training.

embeddings_source This is of source only training.

embeddings_target This is target only training.

All of them are attained at 100 epochs. I still think the reason might be small epochs. I will try more epochs today.

@omg777 For your question, do you mind telling us the name of file and corresponding line number on the latest version please?

Aug 29 '18 01:08 tengerye

@omg777 For training in 'dann' mode, I want to align the shape of samples from different domains. So, I write some code to make sure every batch from the source and target domain is of the same shape. Also in this mode, we have to use samples from different domains in each iteration to compute the loss, so I use zip() to get inputs from two dataloader in an iteration. While in the testing period, we don't have to compute the loss so that I write one loop for each domain. For training in 'source' and 'target' mode, I think maybe the code you mentioned is not necessary.

Aug 29 '18 14:08 CuthbertCai

@CuthbertCai @omg777 I extended the epochs of 'dann' to 300, but still found no mixed. Then I checked the journal paper, which only plots the embedding of SVHN. I have no idea so far. So I think maybe we could try add another experiment to this project?

Aug 30 '18 01:08 tengerye

@tengerye Experiments on SVHN and SynDig are added, but the accuracies are not very good. So, maybe we need to find bugs together.

Aug 31 '18 14:08 CuthbertCai

@CuthbertCai Sure, I will take a look.

Sep 01 '18 06:09 tengerye

@CuthbertCai Thanks for your quick reponse. original version means caffe version of this paper ? I found DANN implemented with [tensorflow. [https://github.com/pumpikano/tf-dann] In this repo, they draw distributions looks like mixed well, even though accuracy is lower than yours. I satisfied the results in this repo, but I want to check that distributions are really mixed enough to deceive domain_discriminator.

Here is my results and plot.
Source Accuracy: 9889/10000 (98.0000%)
Target Accuracy: 8990/10000 (89.0000%)
Domain Accuracy: 10672/20000 (53.0000%)

What's the stopping criteria used?

I'm getting this around epoch 80: Source Accuracy: 9885/10000 (98.0000%) Target Accuracy: 7888/9001 (87.0000%) Domain Accuracy: 10366/19001 (54.0000%)

parameter choices: gamma = 10 lr = 0.001

I'm wondering what could be reason that this implementation gets better accuracy than what's mentioned in the paper?

Oct 26 '18 16:10 ashwin-999

@AshwinAKannan Hi, the stopping criteria is fixed number of iterations. Although the accuracy is better, the plots of embedding seems not mixed. We are still working on it. At the same time, we welcome any help.

Oct 29 '18 03:10 tengerye

Hello,

I ran an experiment for 1000 epochs with a slight change in model - removed batchnorm layers. The idea was to have a model similar to what is mentioned in the supplementary (http://sites.skoltech.ru/compvision/projects/grl/files/suppmat.pdf). Here's what the result looks like after 1000 epochs Source Accuracy: 9841/10000 (98.0000%) Target Accuracy: 7526/9001 (83.0000%) Domain Accuracy: 10484/19001 (55.0000%)

Epoch 990: embedding_990

Epoch around 300 also seem to have the expected distribution: embedding_300

Params: batch_size = 128 epochs = 1000 gamma = 10 theta = 1 lr = 0.0001

Model:


class Extractor(nn.Module):

    def __init__(self):
        super(Extractor, self).__init__()
        self.conv1 = nn.Conv2d(3, 32, kernel_size= 5)
        self.conv2 = nn.Conv2d(32, 48, kernel_size= 5)
        self.conv2_drop = nn.Dropout2d()

    def forward(self, input):
        input = input.expand(input.data.shape[0], 3, 28, 28)
        x = F.relu(F.max_pool2d(self.conv1(input), 2))
        x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
        x = x.view(-1, 48 * 4 * 4)

        return x

class Class_classifier(nn.Module):
    def __init__(self):
        super(Class_classifier, self).__init__()
        self.fc1 = nn.Linear(48 * 4 * 4, 100)
        self.fc2 = nn.Linear(100, 100)
        self.fc3 = nn.Linear(100, 10)

    def forward(self, input):
        logits = F.relu(self.fc1(input))
        logits = self.fc2(F.dropout(logits))
        logits = F.relu(logits)
        logits = self.fc3(logits)

        return F.log_softmax(logits, 1)

class Domain_classifier(nn.Module):
    def __init__(self):
        super(Domain_classifier, self).__init__()
        self.fc1 = nn.Linear(48 * 4 * 4, 100)
        self.fc2 = nn.Linear(100, 2)

    def forward(self, input, constant):
        input = GradReverse.grad_reverse(input, constant)

        logits = F.relu(self.fc1(input))
        logits = F.log_softmax(self.fc2(logits), 1)

        return logits

Though the look mixed, they don't form clear clusters. Thoughts?

Oct 30 '18 12:10 ashwin-999

Hi, thanks for your help! According to your result, I think batchnorm layers are the reasons why the embeddings are not mixed. I add them to model by my self. Maybe we could plot more points to see how they form clusters.

Oct 30 '18 14:10 CuthbertCai

embedding with 1024 points (epoch ~400): embedding_430

Noticed that MNIST results mentioned in the paper don't form distinct clusters either.

Oct 31 '18 03:10 ashwin-999

@CuthbertCai @AshwinAKannan @omg777 I finished experiments: the batchnorm does not affect the final performance but only embedding graph. I commited a new pull request.

FYI, both of them at epoch 90, with batchnorm: embedding_90

without batchnorm: embedding_90

Nov 18 '18 02:11 tengerye

pytorch_DANN pytorch_DANN copied to clipboard

Result has huge difference

pytorch_DANN
pytorch_DANN copied to clipboard