GazeOnce Questions about implementation

Hello, I try to reproduce your great work based on the github repo you mentioned before, but results seem to be not good. I'm new to object detection, so there may be some errors in my code. I want to clarify two things:

In SSD, we have to encode the variance from priorbox when computing losses of face location & landmark and then decode it. I'm not sure if there any way to encode gaze yaw and pitch (They are neither in point-form or center-form), or we don't have to encode gaze in the way of encoding landmark and location at all?
For downstream 3D gaze head, can I just copy the architecture of location head & landmark head?

class LandmarkHead(nn.Module):
    def __init__(self,inchannels=512,num_anchors=3):
        super(LandmarkHead,self).__init__()
        self.conv1x1 = nn.Conv2d(inchannels,num_anchors*10,kernel_size=(1,1),stride=1,padding=0) -------------------------> output 5 landmark
    def forward(self,x):
        out = self.conv1x1(x)
        out = out.permute(0,2,3,1).contiguous()

        return out.view(out.shape[0], -1, 10)

class GazeHead(nn.Module):
    def __init__(self,inchannels=512,num_anchors=3):
        super(GazeHead,self).__init__()
        self.conv1x1 = nn.Conv2d(inchannels,num_anchors*2,kernel_size=(1,1),stride=1,padding=0) --------------------------> output pitch and yaw

    def forward(self,x):
        out = self.conv1x1(x)
        out = out.permute(0,2,3,1).contiguous()

        return out.view(out.shape[0], -1, 2)

Thank you for any help you can provide.

Sep 19 '23 15:09 YuXiangLin1234

Hi, I built my code based on https://github.com/biubug6/Pytorch_Retinaface

I guess box location and landmark need to be encoded and decoded because of the various sizes of faces. I think gaze requires no such process since it's a normalized vector.
Yes, you can.

Hope my answer helps. Thank you!

Sep 20 '23 01:09 mf-zhang

Thanks for your response, I set weights of different tasks to be 1 and get better results ( I set the weight of loss about gaze to be 10 at the beginning and get bad results).

Sep 26 '23 00:09 YuXiangLin1234