CAGFace icon indicating copy to clipboard operation
CAGFace copied to clipboard

How to use BiSeNet based on your code?

Open wozhangzhaohui opened this issue 5 years ago • 11 comments

Thanks for sharing code. Could you please explain how to use BiSeNet based on your code? In the paper, the attention map should contain three components: skin, hair and other part, respectively. But in your code: https://github.com/SeungyounShin/CAGFace/blob/master/dataloader.py#L115

        out , out16, out32 = self.bisenet(imageTensor)
        out   = out.max(dim=1)[0]
        out16 = out16.max(dim=1)[0]
        out32 = out32.max(dim=1)[0]
        prior = torch.cat((out,out16,out32), dim=0).view(inputShape)

Do you mean out , out16, out32 represent the three components? BTW, In BiSeNet,out , out16, out32 are used for auxiliary loss as described in their paper, and they are only used in train phase.

Could you please explain it ?

wozhangzhaohui avatar Feb 19 '20 10:02 wozhangzhaohui

I solved this problem by using pretrained parsenet(provided by CelebA-Mask-HQ) to get attention map instead of BiSeNet. the attention map look like this: attention map: image And corresponding input image and mask result look like this image image

and final result seem ok now! image

wozhangzhaohui avatar Mar 03 '20 04:03 wozhangzhaohui

I solved this problem by using pretrained parsenet(provided by CelebA-Mask-HQ) to get attention map instead of BiSeNet. the attention map look like this: attention map: image And corresponding input image and mask result look like this image image

and final result seem ok now! image

Hi, can you please provide the code that you used to replace BiSeNet. I have tried the pretrained parsenet but can't figure out how it should be embedded. If you can provide the code, it'll be really helpful, Thank You!

shimrah-mahjabeen avatar Mar 13 '20 05:03 shimrah-mahjabeen

I solved this problem by using pretrained parsenet(provided by CelebA-Mask-HQ) to get attention map instead of BiSeNet. the attention map look like this: attention map: image And corresponding input image and mask result look like this image image and final result seem ok now! image

Hi, can you please provide the code that you used to replace BiSeNet. I have tried the pretrained parsenet but can't figure out how it should be embedded. If you can provide the code, it'll be really helpful, Thank You!

Here is the code to get attention map: The input imageTensor is a (1,3,H,W) float tensor. The outout prior is also a (1,3,H,W) float tensor. self.smoothing and self.normalize are same as origin repo provide. self.parsenet is the parsenet model from CelebA-Mask-HQ.

def get_attention(self, imageTensor):
    parsenet_result = self.parsenet(imageTensor)
    parsenet_result = parsenet_result.view(19, self.imsize, self.imsize)
    parsenet_result_max = parsenet_result.data.max(0)
    face_label = parsenet_result_max[1].cpu().numpy()
    face_label_prob = parsenet_result_max[0].cpu().numpy()
    # get mask of skin, hair, other part
    skin = np.zeros((self.imsize, self.imsize)).astype(np.float)
    hair = np.zeros((self.imsize, self.imsize)).astype(np.float)
    otherpart = np.zeros((self.imsize, self.imsize)).astype(np.float)
    skin_mask = face_label == 1
    hair_mask = face_label == 13
    otherpart_mask = (1 < face_label) & (face_label < 13)
    # get prob from mask
    skin[skin_mask] = face_label_prob[skin_mask]
    hair[hair_mask] = face_label_prob[hair_mask]
    otherpart[otherpart_mask] = face_label_prob[otherpart_mask]
    prior = torch.from_numpy(np.array([skin, hair, otherpart])).view(1, 3, self.imsize, self.imsize).float()
    # smooting
    prior = F.pad(prior, (2, 2, 2, 2), mode='reflect')
    with torch.no_grad():
        prior = self.smoothing(prior)
    prior = self.normalize(prior.squeeze()).unsqueeze(0)
    return prior

wozhangzhaohui avatar Mar 20 '20 03:03 wozhangzhaohui

I solved this problem by using pretrained parsenet(provided by CelebA-Mask-HQ) to get attention map instead of BiSeNet. the attention map look like this: attention map:

And corresponding input image and mask result look like this

and final result seem ok now!

Hi, can you please provide the code that you used to replace BiSeNet. I have tried the pretrained parsenet but can't figure out how it should be embedded. If you can provide the code, it'll be really helpful, Thank You!

Here is the code to get attention map: The input imageTensor is a (1,3,H,W) float tensor. The outout prior is also a (1,3,H,W) float tensor. self.smoothing and self.normalize are same as origin repo provide. self.parsenet is the parsenet model from CelebA-Mask-HQ. def get_attention(self, imageTensor): parsenet_result = self.parsenet(imageTensor) parsenet_result = parsenet_result.view(19, self.imsize, self.imsize) parsenet_result_max = parsenet_result.data.max(0) face_label = parsenet_result_max[1].cpu().numpy() face_label_prob = parsenet_result_max[0].cpu().numpy() # get mask of skin, hair, other part skin = np.zeros((self.imsize, self.imsize)).astype(np.float) hair = np.zeros((self.imsize, self.imsize)).astype(np.float) otherpart = np.zeros((self.imsize, self.imsize)).astype(np.float) skin_mask = face_label == 1 hair_mask = face_label == 13 otherpart_mask = (1 < face_label) & (face_label < 13) # get prob from mask skin[skin_mask] = face_label_prob[skin_mask] hair[hair_mask] = face_label_prob[hair_mask] otherpart[otherpart_mask] = face_label_prob[otherpart_mask] prior = torch.from_numpy(np.array([skin, hair, otherpart])).view(1, 3, self.imsize, self.imsize).float() # smooting prior = F.pad(prior, (2, 2, 2, 2), mode='reflect') with torch.no_grad(): prior = self.smoothing(prior) prior = self.normalize(prior.squeeze()).unsqueeze(0) return prior

I tried to integrate parsenet, but it still didn't work, I really don't know how to do it, could you please send me your complete code, my email is [email protected], it is very useful for me, thank you so much

kidwhh avatar Apr 09 '20 13:04 kidwhh

@wozhangzhaohui Hi,thank you for sharing your code,and it did help me.However,I have one question which really confused me,I replaced parsenet from bisenet,and changed learning rate.It seems good as the training loss is about 0.001.But the model result doesn't look as perfect as the result you provided.So,I want to know if you have other tricks to help model training.

SanoPan avatar Apr 29 '20 10:04 SanoPan

@wozhangzhaohui Hi,thank you for sharing your code,and it did help me.However,I have one question which really confused me,I replaced parsenet from bisenet,and changed learning rate.It seems good as the training loss is about 0.001.But the model result doesn't look as perfect as the result you provided.So,I want to know if you have other tricks to help model training.

There are someting might help your training.

  1. Change conv+bn+sigmoid to conv(with bias) in SpatialUpSampling layer
  2. Change ReLU to LeakyReLU
  3. Set learning rate to 0.1

wozhangzhaohui avatar Apr 29 '20 14:04 wozhangzhaohui

I tried to integrate parsenet, but it still didn't work, I really don't know how to do it, could you please send me your complete code, my email is [email protected], it is very useful for me, thank you so much

Sorry, I can not provide complete code. You can share you error message in here, I will try to help with it.

wozhangzhaohui avatar Apr 29 '20 14:04 wozhangzhaohui

@wozhangzhaohui Hi,thank you for sharing your code,and it did help me.However,I have one question which really confused me,I replaced parsenet from bisenet,and changed learning rate.It seems good as the training loss is about 0.001.But the model result doesn't look as perfect as the result you provided.So,I want to know if you have other tricks to help model training.

There are someting might help your training.

  1. Change conv+bn+sigmoid to conv(with bias) in SpatialUpSampling layer
  2. Change ReLU to LeakyReLU
  3. Set learning rate to 0.1

@wozhangzhaohui Well,thank you for your suggestions,I found my result looks like color cast.I wonder if you met the issue ever.My results are as follows. image

SanoPan avatar Apr 30 '20 06:04 SanoPan

@wozhangzhaohui Well,thank you for your suggestions,I found my result looks like color cast.I wonder if you met the issue ever.My results are as follows. image

I met the smae issue. My guess is that using small batch size to train BN layer might cause this problem. Using larger batch size might help. But in my experiment, it will OOM if batch size is larger than 2. I haven't done any further experiment with this issue. I am not sure about how to solve it.

wozhangzhaohui avatar Apr 30 '20 06:04 wozhangzhaohui

@wozhangzhaohui Well,thank you for your suggestions,I found my result looks like color cast.I wonder if you met the issue ever.My results are as follows. image

I met the smae issue. My guess is that using small batch size to train BN layer might cause this problem. Using larger batch size might help. But in my experiment, it will OOM if batch size is larger than 2. I haven't done any further experiment with this issue. I am not sure about how to solve it.

Same with me,I tried to raise batch size but resource exhausted.Thank you for your answer.

SanoPan avatar Apr 30 '20 06:04 SanoPan

Does anyone know if this BiseNet model could be used? https://github.com/osmr/imgclsmob/releases/tag/v0.0.462

It's supposedly trained on the CelebAHQ dataset.

LexCybermac avatar May 26 '20 11:05 LexCybermac