AGE icon indicating copy to clipboard operation
AGE copied to clipboard

Questinons about "get_class_embedding.py"

Open xiaoxiaomiao39 opened this issue 2 years ago • 4 comments

Many thanks for this excellent work. I am trying to use the dataset (https://drive.google.com/drive/folders/1Ytv02FEMk_n_qJui8-fKowr5xKZTpYWb?usp=sharing) and the pretrained PSP model (https://drive.google.com/drive/folders/1gTSghHGuwoj9gKsLc2bcUNF6ioFBpRWB?usp=sharing) you provided to get the embeddings,

 python tools/get_class_embedding.py \
--class_embedding_path=save/classs/embeddings \
--psp_checkpoint_path=pretrained/pSp/psp_animalfaces.pt \
--train_data_path=data/age_animal/animal_faces/train/ \
--test_batch_size=4 \
--test_workers=4

but it doesn't work, did I miss something? FileNotFoundError: [Errno 2] No such file or directory: 'experiment/logs/flowers/checkpoints/iteration_80000.pt'

xiaoxiaomiao39 avatar Apr 08 '22 06:04 xiaoxiaomiao39

You should set the value of --checkpoint_path to None in options/test_options.py.

UniBester avatar Apr 08 '22 06:04 UniBester

Thanks for the quick reply :) Yes it works now.

xiaoxiaomiao39 avatar Apr 08 '22 06:04 xiaoxiaomiao39

Thanks again for the great work.

I have two more questions, after looking into the code.

  1. the dimension of the ocodes is [18,512], why only do the the mean subtraction for the first 6 channals?
ocodes = self.encoder(x)
odw = ocodes[:, :6] - av_codes[:, :6]
dw, A, x = self.ax(odw)
codes = torch.cat((dw + av_codes[:, :6], ocodes[:, 6:]), dim=1)
  1. Is it important to normalize codes with respect to the center of an average face? How the performance changes if doesn't do it?
if self.opts.start_from_latent_avg:
   if self.opts.learn_in_w:
      codes = codes + self.latent_avg.repeat(codes.shape[0], 1)
   else:
      codes = codes + self.latent_avg.repeat(codes.shape[0], 1, 1)
  1. Why split A and ni to two groups?
class Ax(nn.Module):
   def __init__(self, dim):
      super(Ax, self).__init__()
      self.A=nn.Parameter(torch.randn(6, 512, dim), requires_grad=True)
      self.encoder0=EqualLinear(512, dim)
      self.encoder1=EqualLinear(512, dim)
   def forward(self, dw):
      x0=self.encoder0(dw[:, :3])
      x0=x0.unsqueeze(-1).unsqueeze(1)
      x1=self.encoder1(dw[:, 3:6])
      x1=x1.unsqueeze(-1).unsqueeze(1)
      x=[x0.squeeze(-1),x1.squeeze(-1)]
      output_dw0=torch.matmul(self.A[:3], x0).squeeze(-1)
      output_dw1=torch.matmul(self.A[3:6], x1).squeeze(-1)
      output_dw=torch.cat((output_dw0,output_dw1),dim=1)
      return output_dw, self.A, x
  1. For the sparse loss, why divide with 32?
class SparseLoss(nn.Module):

	def __init__(self):
		super(SparseLoss, self).__init__()
		self.theta0=0.5
		self.theta1=-1

	def forward(self, X):
		x0 = torch.sigmoid(self.theta0*X[0].abs()+self.theta1)
		x1 = torch.sigmoid(self.theta0*X[1].abs()+self.theta1)
			
		return x0.sum()/32+x1.sum()/32
  1. During the inference stage, how to refine A to Af? I can see the A is splited into 2 groups --groups=[[0,1,2],[3,4,5]], but I don't know why?
 def sampler(outputs, dist, opts):
    means=dist['mean']
    means_abs=dist['mean_abs']
    covs=dist['cov']
    one = torch.ones_like(torch.from_numpy(means[0]))
    zero = torch.zeros_like(torch.from_numpy(means[0]))
    dws=[]
    groups=[[0,1,2],[3,4,5]]
    for i in range(means.shape[0]):
        x=torch.from_numpy(np.random.multivariate_normal(mean=means[i], cov=covs[i], size=1)).float().cuda()
        mask = torch.where(torch.from_numpy(means_abs[i])>opts.beta, one, zero).cuda()
        x=x*mask
        for g in groups[i]:
            dw=torch.matmul(outputs['A'][g], x.transpose(0,1)).squeeze(-1)
            dws.append(dw)
    dws=torch.stack(dws)
    codes = torch.cat(((opts.alpha*dws.unsqueeze(0)+ outputs['ocodes'][:, :6]), outputs['ocodes'][:, 6:]), dim=1)
    return codes

Looking forward for your reply!!

xiaoxiaomiao39 avatar Apr 11 '22 12:04 xiaoxiaomiao39

Thank you for your attention

  1. We only manipulate the first six layers and . Because the lower layers of styleGAN control the structure and surface attributes and the higher layers only control the hue attributes, which is not benefit for downstream tasks.
  2. Using average latent is a trick from styleGAN.
  3. We divide them into two groups i.e. groups=[[0,1,2],[3,4,5]] for less amount of computation and more stable sampling.
  4. We divide sparse loss with 32 to keep it in a certain order of magnitude, which is not important.
  5. Beta is the threshold used to refine A.

UniBester avatar Apr 14 '22 03:04 UniBester