swag Version of InceptionNet

Version of InceptionNet

Open sonnguyen129 opened this issue 2 years ago • 25 comments

When I read the paper, I saw that on figure 6, the results of WideResNet and Inception appeared. Can you provide code or idea to implement 2 models for swag? Thank you so much

Jul 11 '21 18:07 sonnguyen129

Hi, these codes haven't been polished yet. Will upload then. The idea is the same as that of ResNet, just adding smoothing before the computation of the loss. The implementation details can be found in the paper and supp.

Jul 11 '21 20:07 peiwang062

Thank you for your answer. I have another question. In Figure 3 show activation statistics of different random architectures. Can you tell me how to caculate entropy of those layer and share paper, article and information about that method? Thank you so much.

Jul 12 '21 21:07 sonnguyen129

Equations 5-8 describe the calculation. In fact, no difference with the basic entropy in the textbook, we just normalize to [0,1], this is also a standard operation.

Jul 12 '21 22:07 peiwang062

Thank you for your answer. As I have understood, lower Gram entorpy will cause poorer stylization performance. I have 2 another questions.

Lower Gram entropy in deeper layer(Those layer usually learned importance features for classification task). Do deeper layers make styling worse than shallow ones?
Is stylization the opposite of classification, a good classification model will have high entropy in deep layers, and stylization, deep layers have low entropy?

Jul 13 '21 17:07 sonnguyen129

For 1, first of all, everything discussed in the paper is in the scope of those methods by L2 loss and Gram matrix based optimization (the very popular and common one, including perceptual loss family as well). Lower entropy just makes the optimization of L2 hard, then hard to match the distribution of the style image and target image. From my understanding, the deeper layers truly encode some more informative features of the images that maybe be helpful for stylization. The issue is just hard to match them. In other words, maybe by using other losses not L2, we can leverage the deeper layers and get more wonderful stylized results. For other methods like AdaIN. Not sure if deeper layers ruin the performance. We haven't explored this. For 2, some have been talked about in question 1. In my opinion, high entropy just makes the optimization easier under L2 & gram matrix cases. It is hard to say high entropy is always better than low entropy in the stylization. It depends on the loss for example, maybe something else.

Jul 13 '21 17:07 peiwang062

I have a question that is out of scope of paper. Can we confirm that for the SOTA classification model (VGG, ResNet, Inception,...) the deeper layers will have higher etropy(activation entropy) than the shallow ones?

Jul 13 '21 18:07 sonnguyen129

In my experiments, I mainly did on random network. The observation is deeper layers have lower entropy, say resnet. For well trained models, the same observation is but the discrepancy layer by layer is not that clear. I think this makes sense. Because for a well trained model, the deep layer encodes the high level semantics, for example, the last classification ideally should only one node is activated to indicate the predicted class. thus the entropy is low. But for shallow layers, more neurons are activated because they encode the low-level features that one image should contain many. Consequently, entropy is relatively high.

Jul 13 '21 20:07 peiwang062

Thank you so much for explaining to me. I really like the idea of your paper so I want to understand it thoroughly. I have understood a lot. Wish you all health, success and happiness!

Jul 13 '21 21:07 sonnguyen129

Do you implement equations 5-8 from scratch or use a bulid-in module/function? Can you share the code? Thank in advance

Jul 18 '21 13:07 sonnguyen129

yes, build-in. something like this

from scipy.stats import entropy

feature = featuremap.cpu().data.numpy().squeeze() feature_stat = feature.flatten() feature_stat = softmax(feature_stat) cur_entropy = entropy(feature_stat) / np.log(len(feature_stat.tolist()))

Jul 18 '21 16:07 peiwang062

I have tried in pretrain VGG. But when I tried with 2 diferent input images, entropy values is diference. I think I got mistakes. Though my mind, entropy value is fixed with difference outputs. Can you explain it? Thank in advance

Jul 18 '21 18:07 sonnguyen129

For different style images, the entropy is difference. The shown in our plots is the average.

Jul 18 '21 21:07 peiwang062

How many style images do you test?

Jul 18 '21 21:07 sonnguyen129

vgg_1000styles_8layers i tried calculating entropy with 10000 styles. I found it not as expected as the entropy decreases in the following deeper. Do I have mistakes?

Jul 21 '21 15:07 sonnguyen129

I saw you are using VGG19, for VGG, entropy doesn't decrease. It is for resnet.

Jul 21 '21 16:07 peiwang062

resnet_10000styles (1) I dont have same result with you. I dont know my mistake. This is my code

res = resnet34(pretrained= True)
res1 = nn.Sequential(*list(res.children()))
all_layers = [*res1[:4], *res1[4], *res1[5], *res1[6], *res1[7], *res1[8:]]
new_sequential = nn.Sequential(*all_layers)

import torch 
from torchvision.models.resnet import resnet34
from torchvision import transforms
import torch.nn as nn
import cv2
import matplotlib.pyplot as plt
from PIL import Image
from scipy.stats import entropy
from scipy.special import softmax
import numpy as np
import os
import PIL
import glob
PIL.Image.MAX_IMAGE_PIXELS = 933120000
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

res = resnet34(pretrained= True)
res_slice1 = new_sequential[:5]     #conv2_1
res_slice2 = new_sequential[5:7]  #conv2_3
res_slice3 = new_sequential[7:9] #conv3_2
res_slice4 = new_sequential[9:11]   #CONV3_4
res_slice5 = new_sequential[11:14]  #conv4_3
res_slice6 = new_sequential[14:17]   #conv4_6
res_slice7 = new_sequential[17:18]   #conv5_1
res_slice8 = new_sequential[18:20]   #conv5_3

trans = transforms.Compose([transforms.RandomCrop(256),
                            transforms.ToTensor(),
                            transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                 std=[0.229, 0.224, 0.225])])

def calculate_entropy(featuremap):
    feature = featuremap.cpu().data.numpy().squeeze()
    feature_stat = feature.flatten()
    feature_stat = softmax(feature_stat)
    cur_entropy = entropy(feature_stat) / np.log(len(feature_stat.tolist()))
    return cur_entropy

def extractor(image):
    h1 = res_slice1(image)
    h2 = res_slice2(h1)
    h3 = res_slice3(h2)
    h4 = res_slice4(h3)
    h5 = res_slice5(h4)
    h6 = res_slice6(h5)
    h7 = res_slice7(h6)
    h8 = res_slice8(h7)
    return h1, h2, h3, h4, h5, h6, h7, h8
   
image_style_dir ="/content/gdrive/MyDrive/style-transfer-research/style_train"
style = glob.glob(image_style_dir + "/*")
layer1, layer2, layer3, layer4, layer5, layer6, layer7, layer8 = [], [], [], [], [], [], [], []
for style_img in style[:10000]:
    try:
        img_style = Image.open(style_img)
        x = trans(img_style).unsqueeze_(0)
    except:
        continue
    h1, h2, h3, h4, h5, h6, h7, h8 = extractor(x)

    ent_h1, ent_h2, ent_h3, ent_h4, ent_h5, ent_h6, ent_h7, ent_h8 = calculate_entropy(h1), calculate_entropy(h2), calculate_entropy(h3), calculate_entropy(h4), calculate_entropy(h5), calculate_entropy(h6), calculate_entropy(h7), calculate_entropy(h8)
    layer1.append(ent_h1)
    layer2.append(ent_h2)
    layer3.append(ent_h3)
    layer4.append(ent_h4)
    layer5.append(ent_h5)
    layer6.append(ent_h6)
    layer7.append(ent_h7)
    layer8.append(ent_h8)

print(np.mean(layer1))   
print(np.mean(layer2))   
print(np.mean(layer3))   
print(np.mean(layer4))   
print(np.mean(layer5))   
print(np.mean(layer6))   
print(np.mean(layer7))   
print(np.mean(layer8))

Jul 24 '21 23:07 sonnguyen129

Hi, I didn't check your code diligently, but two clear differences maybe cause our histograms are different. First, the experiments in my paper figure 3 are based on random networks, but I saw you loaded a pretrained one. Yes, in our experiments, the entropy decreasing is slightly not too distinct for pretrained models compared to randomly initialized, but even in this case, you can see in your hist, it is actually dropping up to conv5_3. Second, in my experiment, I used resnet50, not 34. According to our analysis, the deeper the network is, the clearer such entropy differences are. But this should be not a main issue. So I guess when you switch your model to random model. You will get the same figure as that of our paper.

Jul 24 '21 23:07 peiwang062

resnet50_random_10000styles

Here is my result. I think I have mistakes, but I cant found it. Do you have any idea? Thank in advance.

import torch 
from torchvision.models.resnet import resnet50
from torchvision import transforms
import torch.nn as nn
import cv2
import matplotlib.pyplot as plt
from PIL import Image
from scipy.stats import entropy
from scipy.special import softmax
import numpy as np
import os
import PIL
import glob
PIL.Image.MAX_IMAGE_PIXELS = 933120000
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

trans = transforms.Compose([transforms.RandomCrop(256),
                            transforms.ToTensor(),
                            transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                 std=[0.229, 0.224, 0.225])])

def calculate_entropy(featuremap):
    feature = featuremap.cpu().data.numpy().squeeze()
    feature_stat = feature.flatten()
    feature_stat = softmax(feature_stat)
    cur_entropy = entropy(feature_stat) / np.log(len(feature_stat.tolist()))
    return cur_entropy

class ResNetFeatureExtraction(nn.Module):
    def __init__(self):
        super(ResNetFeatureExtraction, self).__init__()
        self.net = resnet50(pretrained=False)
 
    def forward(self, input, output_last_feature=False):
        output = self.net.conv1(input)
        output = self.net.bn1(output)
        output = self.net.relu(output)
        output = self.net.maxpool(output)

        h1 = self.net.layer1[0](output)           #conv2_1
        output = self.net.layer1[1](h1)
        h2 = self.net.layer1[2](output)           #conv2_3

        output = self.net.layer2[0](h2)           
        h3 = self.net.layer2[1](output)           #conv3_2
        output = self.net.layer2[2](h3)
        h4 = self.net.layer2[3](output)           #CONV3_4

        output = self.net.layer3[0](h4)
        output = self.net.layer3[1](output)
        h5 = self.net.layer3[2](output)           #conv4_3
        output = self.net.layer3[3](h5)
        output = self.net.layer3[4](output)
        h6 = self.net.layer3[5](output)           #conv4_6

        h7 = self.net.layer4[0](h6)               #conv5_1
        output = self.net.layer4[1](h7)
        h8 = self.net.layer4[2](output)           #conv5_3

        if output_last_feature:
            return h8
        else:
            return h1, h2, h3, h4, h5, h6, h7, h8
    
model = ResNetFeatureExtraction()
image_style_dir ="/content/gdrive/MyDrive/style_train"
style = glob.glob(image_style_dir + "/*")
layer1, layer2, layer3, layer4, layer5, layer6, layer7, layer8 = [], [], [], [], [], [], [], []
for style_img in style[:10000]:
    try:
        img_style = Image.open(style_img)
        x = trans(img_style).unsqueeze_(0)
    except:
        continue
    h1, h2, h3, h4, h5, h6, h7, h8 = model(x)
   
    ent_h1, ent_h2, ent_h3, ent_h4, ent_h5, ent_h6, ent_h7, ent_h8 = calculate_entropy(h1), calculate_entropy(h2), calculate_entropy(h3), calculate_entropy(h4), \
                                                                      calculate_entropy(h5), calculate_entropy(h6), calculate_entropy(h7), calculate_entropy(h8)
    layer1.append(ent_h1)
    layer2.append(ent_h2)
    layer3.append(ent_h3)
    layer4.append(ent_h4)
    layer5.append(ent_h5)
    layer6.append(ent_h6)
    layer7.append(ent_h7)
    layer8.append(ent_h8)

print(np.mean(layer1))   
print(np.mean(layer2))   
print(np.mean(layer3))   
print(np.mean(layer4))   
print(np.mean(layer5))   
print(np.mean(layer6))   
print(np.mean(layer7))   
print(np.mean(layer8))

Aug 05 '21 01:08 sonnguyen129

Hi, I didn't find any obvious mistake. Your pretrained bar chart looks sensible. Why not just set pretrained=False on your previous code?

Aug 05 '21 01:08 peiwang062

Also, why not just using my code to get the hidden features and compute the entropy? I think it is simple.

Aug 05 '21 02:08 peiwang062

My previous code calculates entropy with pretrain so I set True, the closest code calculates entropy with random model (as you said) so I set pretrain = False. I will try your idea.

Aug 05 '21 02:08 sonnguyen129

I had tried to set pretrain = False in my previous code and here is my result. It is similar resnet_random_10000styles (1)

Aug 05 '21 02:08 sonnguyen129

It is similar but in fact different for example conv3_4 and 2_3. Hmm...No idea why your results look like this. I didn't check your code. Maybe try mine.

Aug 05 '21 02:08 peiwang062

Hi, can you drop an email to me([email protected]). I can send our code for this part. Maybe this can help.

Aug 05 '21 03:08 peiwang062

I have sent an email to you

Aug 05 '21 03:08 sonnguyen129

swag swag copied to clipboard

Version of InceptionNet

swag
swag copied to clipboard