backscrub
backscrub copied to clipboard
Other background replacement code bases - a round up
Having recently discovered that the open source Jitsi video conferencing solution offers ML driven background replacement, I thought it would be interesting to round up who else is doing this here on Github and what tech is used..
- Search: https://github.com/search?p=13&q=virtual+background&type=Repositories
- 187 results! which with a bit of duplicate removal and filtering by star rating..
- Jitsi-Meet, 100% client-side, tflite.js (optionally compiled to WASM, optionally with SIMD support), using BodyPix models.
- ViBa, a tidy 100% python re-working of the original mixed-tech solution by Ben Elder, using python3-tensorflow, python3-opencv & BodyPix models.
- Volcomix virtual background, the inspiration for Jitsi team, 100% client-side, using tflite.js (compiled to WASM & SIMD required) and either BodyPix or MediaPipe Meet models. Really well documented and tested.
- EasyJitsi, 100% client-side React app, using tf.js and BodyPix. Small, nice demo site but slow (3FPS on my laptop)
- VirtBG, 100% client-side, single file implementation, using tf.js, BodyPix. Similar performance to EasyJitsi above as expected. Great example of minimal bloat though!
Volcomix virtual background seems to give great results even on my bad laptop camera while also at 60fps. Would it be possible to do something similar here?
@MartinKlevs glad to hear it! On the surface, both Volcomix and Deepbacksub operate in a similar fashion, using the same Google models to detect a person, but there are a few differences, in particular, the use of async rendering in the Volcomix solution (which will arrive here with https://github.com/floe/deepbacksub/pull/59) and use of the browser's 2d or WebGL canvas in place of OpenCV for other image processing, which will use a GPU where available and likely reduce CPU loading a little, quite possibly making up for the use of WASM and Emscripten compiled C++ tensorflow :)
Volcomix seems to do a much better job overall. It also segments the whole image instead of a cropped area.
Found this just now: https://developers.google.com/ml-kit/vision/selfie-segmentation Seems to be Apache-licensed (for now), problem is actually getting the model file, which is buried inside the MLKit runtime.
Update: you can get the AAR file (which is just a Zip) via https://mvnrepository.com/artifact/com.google.mlkit/segmentation-selfie/16.0.0-beta1 - and there is indeed a .tflite file in there, that should be worth a try (it also has a quadratic input shape, so it should fit a landscape camera image somewhat better than the portrait-shape Meet model).
Nice! I was able to get it working with minimal adustments. The new model outputs a [0, 1] float32 mask. It seems to do a good job.
Yes, quick-and-dirty implementation in https://github.com/floe/deepbacksub/commit/24dc33fcf1bc562754ce79e0bc61e8343ffbd47b - seems to be a candidate for new default model?
I agree. Personally i experience better results with the threshold set to 0.75.
Some of my bookmarked links:
https://github.com/murari023/awesome-background-subtraction https://github.com/SwatiModi/virtual-background-app https://github.com/fangfufu/Linux-Fake-Background-Webcam https://github.com/PapaEcureuil/pyfakebg
Yes, quick-and-dirty implementation in 24dc33f - seems to be a candidate for new default model?
Can you provide example screencaps?
Three really quick-and-dirty (again) screenshots, in the order: new selfie model, Meet model, deeplabv3+.

@insad thanks for those - the list of papers is excellent! I came across the @SwatiModi (Android targeted using MediaPipe) and @fangfufu (Python+Node.js, derived from Ben Elder's original work) projects in my search, I hadn't found @PapaEcureuil's where Streamlit is being used to put a nice GUI on the Python+Tensorflow (full fat) engine.
Did you see this one: https://github.com/ZHKKKe/MODNet ?
A lot of active development going on seems, sadly a lot of communication in chinese only...
https://github.com/PeterL1n/BackgroundMattingV2
https://github.com/YexingWan/Fast-Portrait-Segmentation
https://github.com/mrgloom/awesome-semantic-segmentation
https://github.com/wpf535236337/real-time-network
https://github.com/josephch405/jit-masker
https://github.com/clovaai/ext_portrait_segmentation
The new selfie model sure seems promising. What I think is still missing is the overlapping of masks to fill the whole image area from multiple NN runs.
Did you see this one: https://github.com/ZHKKKe/MODNet ?
It is used in this plugin for OBS: https://github.com/royshil/obs-backgroundremoval
@progandy - thanks, I note from Roy's README.md and a quick look at the code that his filter uses the Microsoft ONNX C++ wrapper for multiple possible ML frameworks (https://github.com/microsoft/onnxruntime), then borrows the ONNX pretrained ML model from https://github.com/ZHKKKe/MODNet (actually their google drive), but not their Python :wink:
@phlash Please build backscrub as free plugin for obs studio
@elkhalafy I have an experimental OBS plugin that uses backscrub here: https://github.com/phlash/obs-backscrub
This builds against the experimental branch of backscrub where the core functionality is separated into a library and deepseg is a wrapper around it (as is the obs plugin).
I hope complete the project and create it as Actual plugin we need it so much. @phlash
@floe the new model looks great. I think there's a place for larger models as well like the one from https://github.com/PeterL1n/BackgroundMattingV2 although I'm not sure what the status of GPU accleration is in backscrub since I haven't personally used XNNPack.
@dsingal0 No GPU acceleration in backscrub as yet, XNNPACK provides CPU optimised kernels for TFLite. That said GPU works[citation needed] via the TFlite GPU delegate and OpenCL in my hacked up branch here: https://github.com/phlash/backscrub/tree/xnnpack-test according to one tester :smile:
I would be interested to try the larger models from Peter Lin's paper, it looks like the ONNX ones are where we should start, which then need converting to TFLite through TF (apparently): https://stackoverflow.com/questions/53182177/how-do-you-convert-a-onnx-to-tflite
@phlash if going for GPU acceleration TensorRT would be great for NVIDIA GPUs since they have ONNX->TensorRT converters. I tried out the current models in the repo and all except DeepLabv3 and MLKit Segmentation were quite unusable. https://github.com/ZHKKKe/MODNet looks very promising based on their colab. It's heavier than the tflite models, but much lighter than BackgroundMattingV2 so it can feasibly run on Intel non-U series CPUs or a dGPU
Just wanted to mention that Zoom now has some kind of ML segmentation in their Linux client (Version 5.7.6 - 31792.0820), too, and it's quite performant. Curious if someone is up for reverse engineering it.