virtual_webcam_background
virtual_webcam_background copied to clipboard
Replace the backend with Mediapipe
Mediapipe runs so much faster. You might want to give it a try. https://google.github.io/mediapipe/
My implementation now uses mediapipe. https://github.com/fangfufu/Linux-Fake-Background-Webcam/
It looks like mediapipe is the best solution. I wonder how this can be combined with python-tf-bodypix (#47) or if they plan to support it.
You don't really need python-tf-bodypix, if your goal is to just segment away the background. Mediapipe is super fast.
mediapipe often shows large rectangular artifacts. Maybe there are more tuning options?
Have you got a screenshot of "large rectangular artifacts"? I didn't experience any in my own implementation.
I need more tests when I have the time for it, but there are (semi-transparent) boxes above the head depending on lighting conditions. The full mobilenet (v2) model with the custom segmentation still works best for segmenting the head, but performs worse on the body, e.g., detecting patterns on the t-shirt as background.
I think it also depends on the resolution of the camera. For mobilenet v2 4:3 works best and mediapipe has the general purpose and the landscape model to experiment with.
You want to use the landscape model with Mediapipe.
I added support for the MediaPipe's Selfie Segmentation in my layered-vision
project. It does indeed seem to be noticeably faster. It will probably be even faster when using the calculation graph.
I seem to experience the occasional segmentation fault. Have you seen that too?
Otherwise I can also see some differences in performance. It appears slightly worse in the upper region, e.g. with a hat. But better at masking out the background around the neck.
I had no segmentation faults (using tensorflow-cpu), but I see blocky artifacts above the head, with rather large blocks. I need to test it more, but it may be different for 4:3 and landscape resolutions.
I am not sure what you mean with the calculation graph.
For really fast segmentation, one can probably use mediapipe and avoid a lot of numpy arrays, when no filters other than segmentation are desired. This may be a bit out of scope for this project (or a nice side-project virtual-webcam-light
) as the focus here is on the flexible filter layers. I could image just bundling an additional main file, that only contains the bare minimum for segmentation.
I had no segmentation faults (using tensorflow-cpu), but I see blocky artifacts above the head, with rather large blocks. I need to test it more, but it may be different for 4:3 and landscape resolutions.
Okay, not sure why I am getting it. It may happen after a few minutes (using Python 3.8). I believe MediaPipe doesn't use TensorFlow. As far as I can tell, there is only one parameter, the model selection (0
or 1
).
I am not sure what you mean with the calculation graph.
For really fast segmentation, one can probably use mediapipe and avoid a lot of numpy arrays, when no filters other than segmentation are desired. This may be a bit out of scope for this project (or a nice side-project
virtual-webcam-light
) as the focus here is on the flexible filter layers. I could image just bundling an additional main file, that only contains the bare minimum for segmentation.
Yes, I meant trying to get more calculation done in MediaPipe. Although I haven't looked into the limitations or how easy it would be to create custom calculators. Currently for me, combining images is actually taking a good amount of time (I am measuring time taken in each filter / layer for my implementation). For example bodypix
is taking around 16ms, mediapipe
is taking around 6ms, creating the composite of images around 11ms, that is excluding some of the other filters (erode, dilate etc.). Maybe I just need more efficient implementations. So overall for me using MediaPipe the frame rate improves from 13 to around 17 (significant, but could be more... and not sure it is worth the segmentation faults). In any case, I wasn't suggesting for you to rewrite your project. Just that it seems to be what MediaPipe is mostly designed for.
Open TODOs here:
- mediapipe artifacts: When do they appear (4:3 ratio, low resolution, ...?) and what can one do about it
- A bit code cleanup of the quick mediapipe integration, e.g., the copies of the mask that are done for the array to have the right shape
- Make sure all filters detect if the neccessary layers are available and do not crash when using mediapipe instead of one of the models with feature segmentation
Mediapipe uses Tensorflow. I don't have the 4:3 ratio and low resolution problem in my own implementation.
I don't think there is a meaningful difference in our implementations. Mediapipe gives you a mask and then you apply it to the image. The quality of the mask is not affected by the post processing using filters and so on.