swag
swag copied to clipboard
Version of InceptionNet
When I read the paper, I saw that on figure 6, the results of WideResNet and Inception appeared. Can you provide code or idea to implement 2 models for swag? Thank you so much
Hi, these codes haven't been polished yet. Will upload then. The idea is the same as that of ResNet, just adding smoothing before the computation of the loss. The implementation details can be found in the paper and supp.
Thank you for your answer. I have another question. In Figure 3 show activation statistics of different random architectures. Can you tell me how to caculate entropy of those layer and share paper, article and information about that method? Thank you so much.
Equations 5-8 describe the calculation. In fact, no difference with the basic entropy in the textbook, we just normalize to [0,1], this is also a standard operation.
Thank you for your answer. As I have understood, lower Gram entorpy will cause poorer stylization performance. I have 2 another questions.
- Lower Gram entropy in deeper layer(Those layer usually learned importance features for classification task). Do deeper layers make styling worse than shallow ones?
- Is stylization the opposite of classification, a good classification model will have high entropy in deep layers, and stylization, deep layers have low entropy?
For 1, first of all, everything discussed in the paper is in the scope of those methods by L2 loss and Gram matrix based optimization (the very popular and common one, including perceptual loss family as well). Lower entropy just makes the optimization of L2 hard, then hard to match the distribution of the style image and target image. From my understanding, the deeper layers truly encode some more informative features of the images that maybe be helpful for stylization. The issue is just hard to match them. In other words, maybe by using other losses not L2, we can leverage the deeper layers and get more wonderful stylized results. For other methods like AdaIN. Not sure if deeper layers ruin the performance. We haven't explored this. For 2, some have been talked about in question 1. In my opinion, high entropy just makes the optimization easier under L2 & gram matrix cases. It is hard to say high entropy is always better than low entropy in the stylization. It depends on the loss for example, maybe something else.
I have a question that is out of scope of paper. Can we confirm that for the SOTA classification model (VGG, ResNet, Inception,...) the deeper layers will have higher etropy(activation entropy) than the shallow ones?
In my experiments, I mainly did on random network. The observation is deeper layers have lower entropy, say resnet. For well trained models, the same observation is but the discrepancy layer by layer is not that clear. I think this makes sense. Because for a well trained model, the deep layer encodes the high level semantics, for example, the last classification ideally should only one node is activated to indicate the predicted class. thus the entropy is low. But for shallow layers, more neurons are activated because they encode the low-level features that one image should contain many. Consequently, entropy is relatively high.
Thank you so much for explaining to me. I really like the idea of your paper so I want to understand it thoroughly. I have understood a lot. Wish you all health, success and happiness!
Do you implement equations 5-8 from scratch or use a bulid-in module/function? Can you share the code? Thank in advance
yes, build-in. something like this
from scipy.stats import entropy
feature = featuremap.cpu().data.numpy().squeeze() feature_stat = feature.flatten() feature_stat = softmax(feature_stat) cur_entropy = entropy(feature_stat) / np.log(len(feature_stat.tolist()))
I have tried in pretrain VGG. But when I tried with 2 diferent input images, entropy values is diference. I think I got mistakes. Though my mind, entropy value is fixed with difference outputs. Can you explain it? Thank in advance
For different style images, the entropy is difference. The shown in our plots is the average.
How many style images do you test?
i tried calculating entropy with 10000 styles. I found it not as expected as the entropy decreases in the following deeper. Do I have mistakes?
I saw you are using VGG19, for VGG, entropy doesn't decrease. It is for resnet.
I dont have same result with you. I dont know my mistake. This is my code
res = resnet34(pretrained= True)
res1 = nn.Sequential(*list(res.children()))
all_layers = [*res1[:4], *res1[4], *res1[5], *res1[6], *res1[7], *res1[8:]]
new_sequential = nn.Sequential(*all_layers)
import torch
from torchvision.models.resnet import resnet34
from torchvision import transforms
import torch.nn as nn
import cv2
import matplotlib.pyplot as plt
from PIL import Image
from scipy.stats import entropy
from scipy.special import softmax
import numpy as np
import os
import PIL
import glob
PIL.Image.MAX_IMAGE_PIXELS = 933120000
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
res = resnet34(pretrained= True)
res_slice1 = new_sequential[:5] #conv2_1
res_slice2 = new_sequential[5:7] #conv2_3
res_slice3 = new_sequential[7:9] #conv3_2
res_slice4 = new_sequential[9:11] #CONV3_4
res_slice5 = new_sequential[11:14] #conv4_3
res_slice6 = new_sequential[14:17] #conv4_6
res_slice7 = new_sequential[17:18] #conv5_1
res_slice8 = new_sequential[18:20] #conv5_3
trans = transforms.Compose([transforms.RandomCrop(256),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])])
def calculate_entropy(featuremap):
feature = featuremap.cpu().data.numpy().squeeze()
feature_stat = feature.flatten()
feature_stat = softmax(feature_stat)
cur_entropy = entropy(feature_stat) / np.log(len(feature_stat.tolist()))
return cur_entropy
def extractor(image):
h1 = res_slice1(image)
h2 = res_slice2(h1)
h3 = res_slice3(h2)
h4 = res_slice4(h3)
h5 = res_slice5(h4)
h6 = res_slice6(h5)
h7 = res_slice7(h6)
h8 = res_slice8(h7)
return h1, h2, h3, h4, h5, h6, h7, h8
image_style_dir ="/content/gdrive/MyDrive/style-transfer-research/style_train"
style = glob.glob(image_style_dir + "/*")
layer1, layer2, layer3, layer4, layer5, layer6, layer7, layer8 = [], [], [], [], [], [], [], []
for style_img in style[:10000]:
try:
img_style = Image.open(style_img)
x = trans(img_style).unsqueeze_(0)
except:
continue
h1, h2, h3, h4, h5, h6, h7, h8 = extractor(x)
ent_h1, ent_h2, ent_h3, ent_h4, ent_h5, ent_h6, ent_h7, ent_h8 = calculate_entropy(h1), calculate_entropy(h2), calculate_entropy(h3), calculate_entropy(h4), calculate_entropy(h5), calculate_entropy(h6), calculate_entropy(h7), calculate_entropy(h8)
layer1.append(ent_h1)
layer2.append(ent_h2)
layer3.append(ent_h3)
layer4.append(ent_h4)
layer5.append(ent_h5)
layer6.append(ent_h6)
layer7.append(ent_h7)
layer8.append(ent_h8)
print(np.mean(layer1))
print(np.mean(layer2))
print(np.mean(layer3))
print(np.mean(layer4))
print(np.mean(layer5))
print(np.mean(layer6))
print(np.mean(layer7))
print(np.mean(layer8))
Hi, I didn't check your code diligently, but two clear differences maybe cause our histograms are different. First, the experiments in my paper figure 3 are based on random networks, but I saw you loaded a pretrained one. Yes, in our experiments, the entropy decreasing is slightly not too distinct for pretrained models compared to randomly initialized, but even in this case, you can see in your hist, it is actually dropping up to conv5_3. Second, in my experiment, I used resnet50, not 34. According to our analysis, the deeper the network is, the clearer such entropy differences are. But this should be not a main issue. So I guess when you switch your model to random model. You will get the same figure as that of our paper.
Here is my result. I think I have mistakes, but I cant found it. Do you have any idea? Thank in advance.
import torch
from torchvision.models.resnet import resnet50
from torchvision import transforms
import torch.nn as nn
import cv2
import matplotlib.pyplot as plt
from PIL import Image
from scipy.stats import entropy
from scipy.special import softmax
import numpy as np
import os
import PIL
import glob
PIL.Image.MAX_IMAGE_PIXELS = 933120000
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
trans = transforms.Compose([transforms.RandomCrop(256),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])])
def calculate_entropy(featuremap):
feature = featuremap.cpu().data.numpy().squeeze()
feature_stat = feature.flatten()
feature_stat = softmax(feature_stat)
cur_entropy = entropy(feature_stat) / np.log(len(feature_stat.tolist()))
return cur_entropy
class ResNetFeatureExtraction(nn.Module):
def __init__(self):
super(ResNetFeatureExtraction, self).__init__()
self.net = resnet50(pretrained=False)
def forward(self, input, output_last_feature=False):
output = self.net.conv1(input)
output = self.net.bn1(output)
output = self.net.relu(output)
output = self.net.maxpool(output)
h1 = self.net.layer1[0](output) #conv2_1
output = self.net.layer1[1](h1)
h2 = self.net.layer1[2](output) #conv2_3
output = self.net.layer2[0](h2)
h3 = self.net.layer2[1](output) #conv3_2
output = self.net.layer2[2](h3)
h4 = self.net.layer2[3](output) #CONV3_4
output = self.net.layer3[0](h4)
output = self.net.layer3[1](output)
h5 = self.net.layer3[2](output) #conv4_3
output = self.net.layer3[3](h5)
output = self.net.layer3[4](output)
h6 = self.net.layer3[5](output) #conv4_6
h7 = self.net.layer4[0](h6) #conv5_1
output = self.net.layer4[1](h7)
h8 = self.net.layer4[2](output) #conv5_3
if output_last_feature:
return h8
else:
return h1, h2, h3, h4, h5, h6, h7, h8
model = ResNetFeatureExtraction()
image_style_dir ="/content/gdrive/MyDrive/style_train"
style = glob.glob(image_style_dir + "/*")
layer1, layer2, layer3, layer4, layer5, layer6, layer7, layer8 = [], [], [], [], [], [], [], []
for style_img in style[:10000]:
try:
img_style = Image.open(style_img)
x = trans(img_style).unsqueeze_(0)
except:
continue
h1, h2, h3, h4, h5, h6, h7, h8 = model(x)
ent_h1, ent_h2, ent_h3, ent_h4, ent_h5, ent_h6, ent_h7, ent_h8 = calculate_entropy(h1), calculate_entropy(h2), calculate_entropy(h3), calculate_entropy(h4), \
calculate_entropy(h5), calculate_entropy(h6), calculate_entropy(h7), calculate_entropy(h8)
layer1.append(ent_h1)
layer2.append(ent_h2)
layer3.append(ent_h3)
layer4.append(ent_h4)
layer5.append(ent_h5)
layer6.append(ent_h6)
layer7.append(ent_h7)
layer8.append(ent_h8)
print(np.mean(layer1))
print(np.mean(layer2))
print(np.mean(layer3))
print(np.mean(layer4))
print(np.mean(layer5))
print(np.mean(layer6))
print(np.mean(layer7))
print(np.mean(layer8))
Hi, I didn't find any obvious mistake. Your pretrained bar chart looks sensible. Why not just set pretrained=False on your previous code?
Also, why not just using my code to get the hidden features and compute the entropy? I think it is simple.
My previous code calculates entropy with pretrain so I set True, the closest code calculates entropy with random model (as you said) so I set pretrain = False. I will try your idea.
I had tried to set pretrain = False in my previous code and here is my result. It is similar
It is similar but in fact different for example conv3_4 and 2_3. Hmm...No idea why your results look like this. I didn't check your code. Maybe try mine.
Hi, can you drop an email to me([email protected]). I can send our code for this part. Maybe this can help.
I have sent an email to you