RealForensics icon indicating copy to clipboard operation
RealForensics copied to clipboard

How are multiple faces handled in preprocessing ?

Open spirosbax opened this issue 2 years ago • 1 comments

Hi and congratulations on your work! I'm trying to reproduce your results and I'm having trouble preprocessing the FF++ dataset using your code. I have calculated the landmarks for each video as per the instructions. When using the extract_faces.py script it fails with error: ValueError: operands could not be broadcast together with shapes (68,2) (136,2) in this line of the crop_patch function. It seems that it expects only one set of landmarks for each frame. But since in FF++ and other datasets there are multiple faces, how is this handled ? The code could run if instead of a numpy array it was a list with one or more (68,2) items but, considering that we want to smooth the landmarks, it would result in very jittery movement since it would take into account multiple faces in different locations. How do you handle this case? Maybe create a .avi video, tracking each face separately and store as vidname_{0}.avi, vidname_{1}.avi, vidname_{2}.avi, etc.

spirosbax avatar Nov 21 '22 11:11 spirosbax

Hi,

Indeed, the code assumes that each video contains one face that needs to be extracted. For example, in FF++, only the largest face is tracked and extracted (see Appendix A in https://arxiv.org/pdf/1901.08971.pdf). To extract multiple faces, you would need to track each face and produce landmarks with an extra dimension (e.g., the shape of the landmarks would be (3, 68, 2) for three faces in a frame). Then with slight modifications to the code, it would be possible to crop and align the faces.

ahaliassos avatar Nov 22 '22 17:11 ahaliassos