py-feat icon indicating copy to clipboard operation
py-feat copied to clipboard

Inconsistent order of detected faces

Open andrewreece opened this issue 5 months ago • 0 comments

When detect_video() finds multiple faces, they do not appear to have a consistent ordering with respect to their position in the video.

For example, in a recorded video call between two people, where one speaker is in a box on the left and the other speaker in a box on the right, each frame index has two rows in the resulting dataframe from detect_video(), one for each face. But sometimes the left speaker is the first entry in that frame index, and sometimes the second. This is apparent from the FaceRectX value.

   frame   FaceRectX  FaceRectY
2     48  418.904871  43.552213
3     48   83.042467  91.174475
4     72   93.583826  92.987578
5     72  421.968295  43.727639

For a two-speaker video call, it's easy enough to group by index and order by X value (and multi-speaker calls could do the same thing using both X and Y); maybe consider putting a note in the docs stating that order isn't guaranteed?

This issue is related to #198, although it's simpler for the video call use case, as heads aren't moving around much and so it doesn't require a latent representation to keep track.

andrewreece avatar Sep 11 '24 15:09 andrewreece