semantic-pyramid-pytorch
semantic-pyramid-pytorch copied to clipboard
Pre-trained classifier
In the pre-trained classifier, why the output layer is not the softmax classifier for classifying 365 labels? Also, are the classifier layers also pre-trained in VGG classifier?
@rosinality I am curious about why you haven't changed your final classifier layer of the pre-trained network (vgg16) to output 365 classes instead of 1000 classes. Is there any specific reason to this? It would be great if you could enlighten about this aspect.
If you want to use label of places 365, then you will need to train or finetune vgg networks. Instead this model/implementation uses pretrained model directly. (Convenient)
Thanks @rosinality. Then, Are you feeding the class labels directly into the Generator and Discriminator? I thought that you might be feeding the labels predicted by the classifier.
Not directly, but uses predicted logits as features.
So, can we manually feed in the class labels for the applications like re-painting etc. Sorry, I didn't understand how you are feeding the class label into the Generator.
Sorry I made mistake. Class labels from dataset is fed to generator using adaptive batch norm. And logits from vgg additionally used as features like feature maps from conv layers.
Thanks @rosinality ! So in all- inputs are noise vector, class labels fed through adaptive batch norm and the features (masked or unmasked) from the pre-trained classifier.
Yes!
Thanks for your comments!
If the labels are provided by the Adaptive Batch norm, Doesn't that mean if we put self.norm=False, class labels will not be fed?
@rosinality Could you please tell what is final resolution size of the image generated from the Generator, I am not able to figure out the same. Is that 224*224 only?
About batch norm, yes, if you set norm=False, then labels will not be used in generator in current imlementations.
I have hard coded images sizes to be 224, and it seems like that authors also used 224px. You can increase image sizes, but I don't know how will vgg behave at higher resolutions.
Ok so generation size is also 224x224.
Yes, it should be same with input image sizes due to reconstruction loss.
Thanks!
@rosinality I was curious about the way you passed the class_ids in the form of 128 dimensional embeddings! I am thinking that since they are passed independently to the architectures in different times, how the model differentiate between the classes?
Model will learn to use and differentiate class informations as there are supervisions from feature matching with images corresponds to class labels, and projection discriminator that uses class informations for adversarial training.
@rosinality Can you suggest what deletions/additions should be done if I want to run your code on google colab. It is showing CUDA out of memory layer. Also, it would be helpful if you could enlighten about the significance of accumulate() function for adding two model parameters. Thanks!
If you want to training I think reducing batch sizes will be most simple way to reduce memory consumptions, but it will hurt the performance of the model. accumulate function will make ema of the model, and in many cases using ema is crucial for gan performances.
Thanks for the inputs!
@rosinality I found that reducing the number of iterations (after removing EMA) model started training. Do you how it was able to allocate memory after reducing number of iterations?
Hmm the number of iterations should be irrelevant to the memory consumptions. Also I can't understand well the meaning of "allocate memory after reducing number of iterations".
Actually previous error was CUDA not able to allocate memory. After, reducing the iterations, it didn't give any error! Also, I am not able to understand when I am putting norm=True, it is showing error related to discriminator inputs - ' new(): argument 'size' must be tuple of ints, but found element of type NoneType at pos 2'. If you have any insights, it would be helpful!
Where did you modified to be norm=True? Also, could give me a full error logs?
Sure I'll give you in a sec!
As batch normalization used in resblock is adaptive one, it requires additional inputs like class conditions. So you can't use it in discriminator.
In my use case, images have to be formed by keeping the class_id (as per Conditional GANs). You told before if I didn't pass norm==True, generator would not learn to produce the images of the particular class_id. Please elaborate on this. Also, what do you mean by 'class conditions' - class_ids? It would be of great help!
Yes, in this case class conditions injected using class ids. And by default generator uses class ids as norm=True in generator. (https://github.com/rosinality/semantic-pyramid-pytorch/blob/master/model.py#L316)
Ahh! Thanks so I don't need to put norm==True in the dis.
Discriminator uses class informations using projection discriminator, so you don't need it.