AppleFaceDetection face landmark tracking over video

hi I'm playing with the vision framework and can use the face landmark feature to get the position of facial features in real time. However, I have to run the detector for every frame. This makes the real time face mask jittery.

any ideas on how we could optimize the landmark detection in a real time feed with only the iOS frameworks?

FYI I tried the object tracker, but it wasn't as impressive as it could be. Maybe you've had better luck?

thanks

Oct 25 '17 04:10 stanchiang

Not a big problem, all you have to do is to use VNDetectFaceLandmarksRequest and handle the landmarks you find.

I'll update a new version, you can check it out :)

B.T.W, I get every frame in real time from AVCaptureVideoDataOutputSampleBufferDelegate so that I perform my VNRequest.

If you want to read a saved video, that's another story. But you can do the same VNRequest!

Oct 25 '17 05:10 Willjay90

Yea I'm already doing that. But I have the request run at every new frame so the tracking is jittery from frame to frame because we aren't using any data from the previous frame.

Oct 25 '17 06:10 stanchiang

Doing a new detection every frame is different from detecting and then tracking subsequent frames. I can already do the former, trying to see if the latter is possible in the Vision framework because it'll perform smoother

Oct 25 '17 06:10 stanchiang

Are you doing detection on a saved video?

Oct 25 '17 06:10 Willjay90

no real time

Oct 25 '17 06:10 stanchiang

Oh, I got it. You wanna doing something like motion detection (smooth way) instead of just detect it every single second.

Oct 25 '17 06:10 Willjay90

yea :)

Oct 25 '17 06:10 stanchiang

I think we want something like https://github.com/HalfdanJ/ofxFaceTracker2 "The face detection in ofxFT2 is considerably slower then ofxFT, but it can easily run on a background thread. The landmark detection (finding the actual details on the face) is faster and more robust."

or https://github.com/hrastnik/face_detect_n_track but with face landmark detection included "The algorithm I came up with is a hybrid using Haar cascades with template matching as a fallback when Haar fails."

based on https://developer.apple.com/documentation/vision/vndetectfacelandmarksrequest splitting the face detection (slow) from the landmark detection (fast) seems possible "If you've already located all the faces in an image, or want to detect landmarks in only a subset of the faces in the image, set the inputFaceObservations"

Oct 25 '17 15:10 stanchiang

There's a lot of vision libraries, also Google Vision API. I don't know what exactly the difference between vision framework and these libs. But these APIs are all based on single image input. You have to feed in image every single time/frame.

Also, I tested with the demo video from dlib C++ with my app(updated, with landmarks). It works pretty well.

The VNDetectFaceLandmarksRequest just saying that

either use face observations output by a VNDetectFaceRectanglesRequest or manually create VNFaceObservation instances with the bounding boxes of the faces you want to analyze.

In my project, I just take the first option. You can definitely run on a background thread. However, if you want to update the UI (draw landmarks on screen), you have to do it on main thread.

Oct 26 '17 01:10 Willjay90

For real time video you also need to take into account the difference between video frame rate and "Vision API" sample rate. Some of the jitter you're experiencing could also be caused by the face bounding box / landmarks updates being lower freq than video - that's a performance issue. For 30fps you need the Vision API to update in less than 0.033 sec + take into account context switch to main Q for drawing.

May 17 '18 09:05 shaibt

Hi shaibt, I am newbie with iOS developer so please show me that how to I can do this " For 30fps you need the Vision API to update in less than 0.033 sec + take into account context switch to main Q for drawing". Thank shaibt

May 23 '18 03:05 Onotoko

Hi hanoi2018,

To be clear, what I meant is that your device has to perform in under 1/30 sec for face detection to run on all frames in 30fps - it mainly depends on your device processing power and Apple's SW/HW optimisations. Little you could probably do yourself to achieve it. I haven't tested yet with an iPhone X/8 so don't know what the peak Vision API performance is on those devices.

May 23 '18 06:05 shaibt

Hi shaibt, Thanks for responding

May 23 '18 07:05 Onotoko

Have you achieved your goal for dealing with jittering? Hope for your methods sharing. Thank you .

Jul 24 '18 10:07 ChiefGodMan

Hi ailias, I haven't achieved yet.

Jul 24 '18 12:07 Onotoko

@stanchiang hello, have you got any luck on this issue? I read through this thread and was trying to use the previous frame face rectangle for the analysis of the next frame. But the result is not satisfying.

Jul 18 '19 16:07 MiraMirror

Same issue here, I'm processing every CVPixelBuffer with VNDetectFaceRectanglesRequest and save it to disk applying a blur filter. This works really well with an iPhone XS but it won't perfom well with a normal iPad. Any recommendations?

Nov 23 '19 14:11 pablogeek

AppleFaceDetection AppleFaceDetection copied to clipboard

face landmark tracking over video

AppleFaceDetection
AppleFaceDetection copied to clipboard