pytorch_DANN
pytorch_DANN copied to clipboard
Result has huge difference
Hi, The result of my code has huge difference with yours. At the epoch 99, my result is:
Source Accuracy: 9839/10000 (98.0000%)
Target Accuracy: 5958/9001 (66.0000%)
Domain Accuracy: 10646/19001 (56.0000%)
May I ask how much epoch did you run to get that result please? Thank you in advance.
@tengerye I run this code several times and find that the results are not stable. The best result I get is about 72% and the worst one is about 67%. I think the instability is caused by the property of adversarial training which is similar to GAN.
I did experiment that trained the model only from source domain and predict on the target domain.
Source Accuracy: 9513/10000 (95.0000%)
Target Accuracy: 2161/9001 (24.0000%)
Domain Accuracy: 7278/19001 (38.0000%)
which could prove the method works but the result has obvious difference though. Thank you very much.
This time, I trained on target domain.
Source Accuracy: 9159/10000 (91.0000%)
Target Accuracy: 7692/9001 (85.0000%)
Domain Accuracy: 9962/19001 (52.0000%)
The interesting thing is the model trained on target domain has higher accuracy on source domain.
@tengerye After merging your request, maybe I found the reason why our results are so different. In function get_test_loader()
, I used transforms.CenterCrop()
while you used transforms.RandomCrop()
for MNIST_M
. Thus, I tried transforms.CenterCrop()
for the newest version, then the target accuracy of 'dann' model can reach 70+% after 50 epochs and of 'target' mode can reach 94% after 20 epochs. In fact, I'm not sure which function should be the correct one.
@CuthbertCai You are right. Just for the record, the accuracy fluctuates above 0.7 after 50 epochs of 'dann'.
The result of 100 epochs is:
Source Accuracy: 9839/10000 (98.0000%)
Target Accuracy: 7007/9001 (77.0000%)
Domain Accuracy: 9848/19001 (51.0000%)
; for the 'target', the result of 20 epochs is:
Source Accuracy: 9570/10000 (95.0000%)
Target Accuracy: 8576/9001 (95.0000%)
Domain Accuracy: 8171/19001 (43.0000%)
. They are performed with transforms.CenterCrop()
on the test.
I am updating the result which is produced by transforms.CenterCrop()
instead of transforms.RandomCrop()
.
Train on source only:
Source Accuracy: 9834/10000 (98.0000%)
Target Accuracy: 3366/9001 (37.0000%)
Domain Accuracy: 7542/19001 (39.0000%)
Train on target only:
Source Accuracy: 7247/10000 (72.0000%)
Target Accuracy: 6669/9001 (74.0000%)
Domain Accuracy: 11612/19001 (61.0000%)
Train with dann:
Source Accuracy: 9827/10000 (98.0000%)
Target Accuracy: 6828/9001 (75.0000%)
Domain Accuracy: 10147/19001 (53.0000%)
@tengerye @CuthbertCai Hi, I plot my results (Target Acc:0.78, Domain Acc: 0.54 on 100 epoch) but distributions are not mixed. you guys have no issue about this ?
@omg777 I tested the original version, ploted the embeddings and founded that the distributions are not mixed. For the newest version, I just recorded the accuracy and did not check what the embeddings looks like. Also, I ploted the embeddings of DAN and founeded that the distributions are mixed whose accuracy is lower than DANN. Thus, I'm not sure whether the embedding plot can be a essential metric for domain adaptation.
@CuthbertCai Thanks for your quick reponse. original version means caffe version of this paper ? I found DANN implemented with [tensorflow. [https://github.com/pumpikano/tf-dann] In this repo, they draw distributions looks like mixed well, even though accuracy is lower than yours. I satisfied the results in this repo, but I want to check that distributions are really mixed enough to deceive domain_discriminator.
Here is my results and plot.
Source Accuracy: 9889/10000 (98.0000%)
Target Accuracy: 8990/10000 (89.0000%)
Domain Accuracy: 10672/20000 (53.0000%)
@omg777 The original version means the first version I pushed, instead of the caffe implemented version. In fact, I don't know why the embedding are not mixed well while the accuracy is high. I guess there exists some problems during t-sne process, but I'm not sure of it. If you founded the reason, please tell me. Thanks a lot.
Hi, sorry about late reply @omg777 @CuthbertCai , I have awarded of that but concluded it was because of small iterations of training. I will have a look at it. It is unlikely that the t-sne is wrong, since I tested so many times. But I will have a look at that as well just in case.
@tengerye Hi, you mean that this repo works well include t-sne ? I checked t-sne codes compared to tensorfow repo. but I haven't find differences. I will test with bigger epochs as you mentioned and share the results in here. ++ When I changed batch size on mnist and mnistm loader, t-sne plot according to the batch size, how can I draw always 500 samples like tf repo? thanks!
@omg777 I wrote the part of t-sne after reading the corresponding codes of TF, so surely similar. :laughing:
As to the flexible size of samples to t-sne, I will adjust that and update soon.
@omg777 Done. You can find it on my branch. I have sent a pull request to @CuthbertCai . By the way, my experiments are still running. Let's see if the problem is of small iterations.
@tengerye @omg777 Here is my embedding plot after 100 epochs, and the embeddings seem not mixed well.
@tengerye @CuthbertCai
I have some questions about code.
What is this code ?
input1, label1 = input1[0:size, :, :, :], label1[0:size]
In test code, there are 2 for loop. Is there any reasons to separate them? In train code, there are combined one as enumerate(zip(... , ...))
Thanks!
This is of mixed examples training.
This is of source only training.
This is target only training.
All of them are attained at 100 epochs. I still think the reason might be small epochs. I will try more epochs today.
@omg777 For your question, do you mind telling us the name of file and corresponding line number on the latest version please?
@omg777 For training in 'dann' mode, I want to align the shape of samples from different domains. So, I write some code to make sure every batch from the source and target domain is of the same shape. Also in this mode, we have to use samples from different domains in each iteration to compute the loss, so I use zip()
to get inputs from two dataloader in an iteration. While in the testing period, we don't have to compute the loss so that I write one loop for each domain. For training in 'source' and 'target' mode, I think maybe the code you mentioned is not necessary.
@CuthbertCai @omg777 I extended the epochs of 'dann' to 300, but still found no mixed. Then I checked the journal paper, which only plots the embedding of SVHN. I have no idea so far. So I think maybe we could try add another experiment to this project?
@tengerye Experiments on SVHN and SynDig are added, but the accuracies are not very good. So, maybe we need to find bugs together.
@CuthbertCai Sure, I will take a look.
@CuthbertCai Thanks for your quick reponse. original version means caffe version of this paper ? I found DANN implemented with [tensorflow. [https://github.com/pumpikano/tf-dann] In this repo, they draw distributions looks like mixed well, even though accuracy is lower than yours. I satisfied the results in this repo, but I want to check that distributions are really mixed enough to deceive domain_discriminator.
Here is my results and plot.
Source Accuracy: 9889/10000 (98.0000%) Target Accuracy: 8990/10000 (89.0000%) Domain Accuracy: 10672/20000 (53.0000%)
What's the stopping criteria used?
I'm getting this around epoch 80: Source Accuracy: 9885/10000 (98.0000%) Target Accuracy: 7888/9001 (87.0000%) Domain Accuracy: 10366/19001 (54.0000%)
parameter choices: gamma = 10 lr = 0.001
I'm wondering what could be reason that this implementation gets better accuracy than what's mentioned in the paper?
@AshwinAKannan Hi, the stopping criteria is fixed number of iterations. Although the accuracy is better, the plots of embedding seems not mixed. We are still working on it. At the same time, we welcome any help.
Hello,
I ran an experiment for 1000 epochs with a slight change in model - removed batchnorm layers. The idea was to have a model similar to what is mentioned in the supplementary (http://sites.skoltech.ru/compvision/projects/grl/files/suppmat.pdf). Here's what the result looks like after 1000 epochs Source Accuracy: 9841/10000 (98.0000%) Target Accuracy: 7526/9001 (83.0000%) Domain Accuracy: 10484/19001 (55.0000%)
Epoch 990:
Epoch around 300 also seem to have the expected distribution:
Params: batch_size = 128 epochs = 1000 gamma = 10 theta = 1 lr = 0.0001
Model:
class Extractor(nn.Module):
def __init__(self):
super(Extractor, self).__init__()
self.conv1 = nn.Conv2d(3, 32, kernel_size= 5)
self.conv2 = nn.Conv2d(32, 48, kernel_size= 5)
self.conv2_drop = nn.Dropout2d()
def forward(self, input):
input = input.expand(input.data.shape[0], 3, 28, 28)
x = F.relu(F.max_pool2d(self.conv1(input), 2))
x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
x = x.view(-1, 48 * 4 * 4)
return x
class Class_classifier(nn.Module):
def __init__(self):
super(Class_classifier, self).__init__()
self.fc1 = nn.Linear(48 * 4 * 4, 100)
self.fc2 = nn.Linear(100, 100)
self.fc3 = nn.Linear(100, 10)
def forward(self, input):
logits = F.relu(self.fc1(input))
logits = self.fc2(F.dropout(logits))
logits = F.relu(logits)
logits = self.fc3(logits)
return F.log_softmax(logits, 1)
class Domain_classifier(nn.Module):
def __init__(self):
super(Domain_classifier, self).__init__()
self.fc1 = nn.Linear(48 * 4 * 4, 100)
self.fc2 = nn.Linear(100, 2)
def forward(self, input, constant):
input = GradReverse.grad_reverse(input, constant)
logits = F.relu(self.fc1(input))
logits = F.log_softmax(self.fc2(logits), 1)
return logits
Though the look mixed, they don't form clear clusters. Thoughts?
Hi, thanks for your help! According to your result, I think batchnorm layers are the reasons why the embeddings are not mixed. I add them to model by my self. Maybe we could plot more points to see how they form clusters.
embedding with 1024 points (epoch ~400):
Noticed that MNIST results mentioned in the paper don't form distinct clusters either.
@CuthbertCai @AshwinAKannan @omg777 I finished experiments: the batchnorm does not affect the final performance but only embedding graph. I commited a new pull request.
FYI, both of them at epoch 90,
with batchnorm:
without batchnorm: