StyleCLIP Preprocessing Bug

There's a small bug in run_optimization.py which could affect quality. The optimization seems to be learning around it.

Bug The output of StyleGAN is directly passed into CLIP here.

How to fix

StyleGAN outputs values in range [-1, 1] but some values fall outside the range so it needs to be clamped.
The values need to be scaled from [-1, 1] to [0, 1]
The values need to be normalized using the CLIP preprocessing.

Apr 26 '21 20:04 cysmith

Hi @cysmith , Nice catch! Thank you for bringing it up 😊 I will try to solve it soon, but I invite you to open a PR 😁 Anyway, I will update when it is solved.

May 04 '21 22:05 orpatashnik

Below may work? refer norm stats from https://github.com/openai/CLIP/blob/main/clip/clip.py#L82

class CLIPLoss(torch.nn.Module):
    def __init__(self, opts):
        super(CLIPLoss, self).__init__()
        self.model, self.preprocess = clip.load("ViT-B/32", device="cuda")
        self.face_pool = torch.nn.AdaptiveAvgPool2d((224, 224))
        self.mean = torch.tensor([0.48145466, 0.4578275, 0.40821073], device="cuda").view(1,3,1,1)
        self.std = torch.tensor([0.26862954, 0.26130258, 0.27577711], device="cuda").view(1,3,1,1)

    def forward(self, image, text):
        image = image.add(1).div(2)
        image = image.sub(self.mean).div(self.std)
        image = self.face_pool(image)
        similarity = 1 - self.model(image, text)[0] / 100
        return similarity

Jan 17 '22 09:01 ozmig77