virtual_webcam_background icon indicating copy to clipboard operation
virtual_webcam_background copied to clipboard

Replace the backend with Mediapipe

Open fangfufu opened this issue 3 years ago • 12 comments

Mediapipe runs so much faster. You might want to give it a try. https://google.github.io/mediapipe/

My implementation now uses mediapipe. https://github.com/fangfufu/Linux-Fake-Background-Webcam/

fangfufu avatar Jun 07 '21 01:06 fangfufu

It looks like mediapipe is the best solution. I wonder how this can be combined with python-tf-bodypix (#47) or if they plan to support it.

allo- avatar Jun 17 '21 13:06 allo-

You don't really need python-tf-bodypix, if your goal is to just segment away the background. Mediapipe is super fast.

fangfufu avatar Jun 17 '21 18:06 fangfufu

mediapipe often shows large rectangular artifacts. Maybe there are more tuning options?

allo- avatar Jun 24 '21 11:06 allo-

Have you got a screenshot of "large rectangular artifacts"? I didn't experience any in my own implementation.

fangfufu avatar Jun 24 '21 11:06 fangfufu

I need more tests when I have the time for it, but there are (semi-transparent) boxes above the head depending on lighting conditions. The full mobilenet (v2) model with the custom segmentation still works best for segmenting the head, but performs worse on the body, e.g., detecting patterns on the t-shirt as background.

I think it also depends on the resolution of the camera. For mobilenet v2 4:3 works best and mediapipe has the general purpose and the landscape model to experiment with.

allo- avatar Jun 24 '21 11:06 allo-

You want to use the landscape model with Mediapipe.

fangfufu avatar Jun 24 '21 11:06 fangfufu

I added support for the MediaPipe's Selfie Segmentation in my layered-vision project. It does indeed seem to be noticeably faster. It will probably be even faster when using the calculation graph. I seem to experience the occasional segmentation fault. Have you seen that too?

Otherwise I can also see some differences in performance. It appears slightly worse in the upper region, e.g. with a hat. But better at masking out the background around the neck.

de-code avatar Jul 04 '21 00:07 de-code

I had no segmentation faults (using tensorflow-cpu), but I see blocky artifacts above the head, with rather large blocks. I need to test it more, but it may be different for 4:3 and landscape resolutions.

I am not sure what you mean with the calculation graph.

For really fast segmentation, one can probably use mediapipe and avoid a lot of numpy arrays, when no filters other than segmentation are desired. This may be a bit out of scope for this project (or a nice side-project virtual-webcam-light) as the focus here is on the flexible filter layers. I could image just bundling an additional main file, that only contains the bare minimum for segmentation.

allo- avatar Jul 04 '21 00:07 allo-

I had no segmentation faults (using tensorflow-cpu), but I see blocky artifacts above the head, with rather large blocks. I need to test it more, but it may be different for 4:3 and landscape resolutions.

Okay, not sure why I am getting it. It may happen after a few minutes (using Python 3.8). I believe MediaPipe doesn't use TensorFlow. As far as I can tell, there is only one parameter, the model selection (0 or 1).

I am not sure what you mean with the calculation graph.

For really fast segmentation, one can probably use mediapipe and avoid a lot of numpy arrays, when no filters other than segmentation are desired. This may be a bit out of scope for this project (or a nice side-project virtual-webcam-light) as the focus here is on the flexible filter layers. I could image just bundling an additional main file, that only contains the bare minimum for segmentation.

Yes, I meant trying to get more calculation done in MediaPipe. Although I haven't looked into the limitations or how easy it would be to create custom calculators. Currently for me, combining images is actually taking a good amount of time (I am measuring time taken in each filter / layer for my implementation). For example bodypix is taking around 16ms, mediapipe is taking around 6ms, creating the composite of images around 11ms, that is excluding some of the other filters (erode, dilate etc.). Maybe I just need more efficient implementations. So overall for me using MediaPipe the frame rate improves from 13 to around 17 (significant, but could be more... and not sure it is worth the segmentation faults). In any case, I wasn't suggesting for you to rewrite your project. Just that it seems to be what MediaPipe is mostly designed for.

de-code avatar Jul 04 '21 01:07 de-code

Open TODOs here:

  • mediapipe artifacts: When do they appear (4:3 ratio, low resolution, ...?) and what can one do about it
  • A bit code cleanup of the quick mediapipe integration, e.g., the copies of the mask that are done for the array to have the right shape
  • Make sure all filters detect if the neccessary layers are available and do not crash when using mediapipe instead of one of the models with feature segmentation

allo- avatar Jul 08 '21 18:07 allo-

Mediapipe uses Tensorflow. I don't have the 4:3 ratio and low resolution problem in my own implementation.

fangfufu avatar Jul 08 '21 19:07 fangfufu

I don't think there is a meaningful difference in our implementations. Mediapipe gives you a mask and then you apply it to the image. The quality of the mask is not affected by the post processing using filters and so on.

allo- avatar Jul 09 '21 09:07 allo-