deep-head-pose icon indicating copy to clipboard operation
deep-head-pose copied to clipboard

How yaw, roll and pitch are calculated for the landmark-based networks?

Open cnaaq opened this issue 6 years ago • 7 comments

Hi and thanks for the great work :)

I have a question about the the table 1 and 2 in your paper. Some networks like 3DDFA and FAN can only detect 68 landmarks on the image. Could you please explain how you used that information to calculate yaw, roll, pitch? Is there a paper or mathematical way as a source to read?

Many thanks!

cnaaq avatar Apr 24 '19 13:04 cnaaq

landmarks_68_3D = np.array( [
[-73.393523  , -29.801432   , 47.667532   ],
[-72.775014  , -10.949766   , 45.909403   ],
[-70.533638  , 7.929818     , 44.842580   ],
[-66.850058  , 26.074280    , 43.141114   ],
[-59.790187  , 42.564390    , 38.635298   ],
[-48.368973  , 56.481080    , 30.750622   ],
[-34.121101  , 67.246992    , 18.456453   ],
[-17.875411  , 75.056892    , 3.609035    ],
[0.098749    , 77.061286    , -0.881698   ],
[17.477031   , 74.758448    , 5.181201    ],
[32.648966   , 66.929021    , 19.176563   ],
[46.372358   , 56.311389    , 30.770570   ],
[57.343480   , 42.419126    , 37.628629   ],
[64.388482   , 25.455880    , 40.886309   ],
[68.212038   , 6.990805     , 42.281449   ],
[70.486405   , -11.666193   , 44.142567   ],
[71.375822   , -30.365191   , 47.140426   ],
[-61.119406  , -49.361602   , 14.254422   ],
[-51.287588  , -58.769795   , 7.268147    ],
[-37.804800  , -61.996155   , 0.442051    ],
[-24.022754  , -61.033399   , -6.606501   ],
[-11.635713  , -56.686759   , -11.967398  ],
[12.056636   , -57.391033   , -12.051204  ],
[25.106256   , -61.902186   , -7.315098   ],
[38.338588   , -62.777713   , -1.022953   ],
[51.191007   , -59.302347   , 5.349435    ],
[60.053851   , -50.190255   , 11.615746   ],
[0.653940    , -42.193790   , -13.380835  ],
[0.804809    , -30.993721   , -21.150853  ],
[0.992204    , -19.944596   , -29.284036  ],
[1.226783    , -8.414541    , -36.948060  ],
[-14.772472  , 2.598255     , -20.132003  ],
[-7.180239   , 4.751589     , -23.536684  ],
[0.555920    , 6.562900     , -25.944448  ],
[8.272499    , 4.661005     , -23.695741  ],
[15.214351   , 2.643046     , -20.858157  ],
[-46.047290  , -37.471411   , 7.037989    ],
[-37.674688  , -42.730510   , 3.021217    ],
[-27.883856  , -42.711517   , 1.353629    ],
[-19.648268  , -36.754742   , -0.111088   ],
[-28.272965  , -35.134493   , -0.147273   ],
[-38.082418  , -34.919043   , 1.476612    ],
[19.265868   , -37.032306   , -0.665746   ],
[27.894191   , -43.342445   , 0.247660    ],
[37.437529   , -43.110822   , 1.696435    ],
[45.170805   , -38.086515   , 4.894163    ],
[38.196454   , -35.532024   , 0.282961    ],
[28.764989   , -35.484289   , -1.172675   ],
[-28.916267  , 28.612716    , -2.240310   ],
[-17.533194  , 22.172187    , -15.934335  ],
[-6.684590   , 19.029051    , -22.611355  ],
[0.381001    , 20.721118    , -23.748437  ],
[8.375443    , 19.035460    , -22.721995  ],
[18.876618   , 22.394109    , -15.610679  ],
[28.794412   , 28.079924    , -3.217393   ],
[19.057574   , 36.298248    , -14.987997  ],
[8.956375    , 39.634575    , -22.554245  ],
[0.381549    , 40.395647    , -23.591626  ],
[-7.428895   , 39.836405    , -22.406106  ],
[-18.160634  , 36.677899    , -15.121907  ],
[-24.377490  , 28.677771    , -4.785684   ],
[-6.897633   , 25.475976    , -20.893742  ],
[0.340663    , 26.014269    , -22.220479  ],
[8.444722    , 25.326198    , -21.025520  ],
[24.474473   , 28.323008    , -5.712776   ],
[8.449166    , 30.596216    , -20.671489  ],
[0.205322    , 31.408738    , -21.903670  ],
[-7.198266   , 30.844876    , -20.328022  ] ], dtype=np.float32)

def rotationMatrixToEulerAngles(R) :
    sy = math.sqrt(R[0,0] * R[0,0] +  R[1,0] * R[1,0])
    singular = sy < 1e-6
    if  not singular :
        x = math.atan2(R[2,1] , R[2,2])
        y = math.atan2(-R[2,0], sy)
        z = math.atan2(R[1,0], R[0,0])
    else :
        x = math.atan2(-R[1,2], R[1,1])
        y = math.atan2(-R[2,0], sy)
        z = 0
    return np.array([x, y, z])

#returns pitch,yaw,roll [-1...+1]
def estimate_pitch_yaw_roll(aligned_256px_landmarks):
    shape = (256,256)
    focal_length = shape[1]
    camera_center = (shape[1] / 2, shape[0] / 2)
    camera_matrix = np.array(
        [[focal_length, 0, camera_center[0]],
         [0, focal_length, camera_center[1]],
         [0, 0, 1]], dtype=np.float32)

    (_, rotation_vector, translation_vector) = cv2.solvePnP(
        landmarks_68_3D,
        aligned_256px_landmarks.astype(np.float32),
        camera_matrix,
        np.zeros((4, 1)) )

    pitch, yaw, roll = rotationMatrixToEulerAngles( cv2.Rodrigues(rotation_vector)[0] )
    pitch = np.clip ( pitch*1.25, -1.0, 1.0 )
    yaw = np.clip ( yaw*1.25, -1.0, 1.0 )
    roll = np.clip ( roll*1.25, -1.0, 1.0 )
    return pitch, yaw, roll

iperov avatar Apr 25 '19 04:04 iperov

Hi @iperov and thank you very much for your reply. I have two questions:

  1. how did you make the camera_matrix? It should be different from case to case, right? or am I mistaken?
  2. what is aligned_256px_landmarks argument?

Thanks in advance!

cnaaq avatar Apr 25 '19 15:04 cnaaq

its my function in Deepfacelab project. aligned_256px_landmarks is landmarks from aligned face in 256x256 image.

of course you may not to align.

iperov avatar Apr 25 '19 15:04 iperov

can we say that aligned_256px_landmarks == landmarks_68_3D[ 0:1, : ] ? just omitting z information

sorry but I didn't get the answer to camera_matrix. Is it only true for your camera?

cnaaq avatar Apr 25 '19 16:04 cnaaq

nevermind, use any landmarks you want

i have no camera, I am using faces from CelebA dataset.

iperov avatar Apr 25 '19 16:04 iperov

OK, I got the theory behind the landmarks 3D and 2D Just one more question: what is the idea behind this line? pitch = np.clip ( pitch*1.25, -1.0, 1.0 ) yaw = np.clip ( yaw*1.25, -1.0, 1.0 ) roll = np.clip ( roll*1.25, -1.0, 1.0 ) why angles between -1 & 1?

cnaaq avatar May 06 '19 10:05 cnaaq

fix it for yourself

iperov avatar May 06 '19 10:05 iperov