mediapipe
mediapipe copied to clipboard
@mediapipe/tasks-vision background replacement quality
Hello, I rewrote @mediapipe/selfie_segmentation background replacement to @mediapipe/tasks-vision, and the quality of the background replacement is noticeably worse. I used the code from the official documentation as an example and changed the part responsible for segmentation - https://codepen.io/volodymyrl/pen/mdQKMdR?editors=0010. Is something wrong with my code, or is this how the model works(then why is it worse than deprecated @mediapipe/selfie_segmentation)?
@volodymyrl,
Could you please share any reference example compare to new tasks vision regarding the quality that can be more helpful bring up to internal notice. Thank you
Hey @kuaashish , thanks for your answer. This is an example of background replacement https://codepen.io/Guimauve01/pen/wvEaVrN with @mediapipe/selfie_segmentation
You should use confidenceMask instead of categoryMask to make the edge smoother. Check out this example: https://codepen.io/khanhlvg/full/WNYaqNW
Also, I had to write a GLSLShader to prevent GPU -> CPU transfers to get performance on par with selfie_segmentation.
@kuaashish @khanhlvg, thanks for the example. I used it to prepare another example that combines both versions https://codepen.io/volodymyrl/pen/VwVVjxd?editors=1111. @mediapipe/selfie_segmentation (on the left) and @mediapipe/tasks-vision (on the right), you can see that @mediapipe/tasks-vision is still worse(the image has torn edges).
@satoren, sorry, I am not familiar with GLSL Shaders. Can you please explain what I need to do to improve media pipe performance?
The difference you're seeing mostly caused by the visualization logic. I tweaked the edge smoothening logic a bit to improve reduce the visibility of the edge. If you set minConfidence = 0 and maxConfidence = 1, you'll get the same result as the legacy selfie segmentation SDK.
Besides, in the legacy SDK you're using the square input model (modelSelection: 0). If your input is landscape image, you should switch that to 1.
@volodymyrl
Can you please explain what I need to do to improve media pipe performance?
I'm sorry can't post the code, but I hope it gives you a hint.
For best performance, I needed to get the mask as a mask image, like the selfie segmentation. See MPMask comments.
Here is the approach I took.
-
Check the type that MPMask holds.
- If
hasUint8Array- new ImageData and copy to data from
mask.getAsUint8Array() - Converted to ImageBitmap using createImageBitmap for use in
CanvasRenderingContext2DdrawImage
- new ImageData and copy to data from
- If
hasFloat32Array- Almost the same as
hasUint8Array, but the difference is thatgetAsFloat32Arrayis used.
- Almost the same as
- If
hasWebGLTexture- Use
getAsWebGLTextureto get the texture and render it to Canvas using WebGL. The part converting to ImageBitmap with MPImage, a class similar to MPMask, may be helpful. - Converted to Canvas using createImageBitmap for use in
CanvasRenderingContext2DdrawImage.
- Use
- If
-
Use result ImageBitmap like segmentationMask in selfieSegmentation
Since the above steps are a bit cumbersome, I think one option is to continue using selfie segmentation until an easy-to-use environment is available.
This may be best discussed on #4491.
This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.
@satoren @khanhlvg, thanks for your answers! I updated the replace background function, and it looks acceptable, but still worse than @selfie_segmentation, not to mention how the background replacement works, for example, in Zoom. @satoren I can't use @selfie_segmentation because it has errors when you try to stop it, and the only suggestion to fix it was to upgrade to task-version (https://github.com/google/mediapipe/issues/3373).
@khanhlvg,
Could you please look into this issue? Thank you
@volodymyrl https://codepen.io/satoren/pen/rNQXRqp How about this? This is running on the CPU and not optimized for performance as here.
Hey @satoren, thanks for your answer.
https://codepen.io/satoren/pen/rNQXRqp
In this example, the quality looks like the same as for @selfie_segmentation
How about this? This is running on the CPU and not optimized for performance as https://github.com/google/mediapipe/issues/4630#issuecomment-1657373951.
I use the second approach with getAsFloat32Array, but it works slow. To improve performance, I tried to use Web worker, but there are some limitations to the data you can post from the Web worker https://github.com/google/mediapipe/issues/4694.
You can transfer by converting to ImageBitmap
Also, using a webworker won't make it any faster. The bottleneck for this is the transfer from GPU to CPU. Try to find an efficient way to convert from WebGLTexture to ImageBitmap.
+1 to everything satoren@ said :). In particular, generally try to keep things on GPU for best performance*. This is especially true with segmentation, since that can return an image and therefore can be run 100% on GPU, allowing for nice pipelining. So here's a quick summary for best performance for segmentation:
- Run with the 'GPU' delegate
- When running with 'GPU' delegate, avoid using any
getAs*Array()calls; exclusively usegetAsWebGLTexture()instead for best performance - Prefer confidence masks without category masks, if that works for your use case, as those are cheaper to compute, and provide more information
- Using the above means you'll need to write a GL shader to render the texture to your OffscreenCanvas; if you need it as an ImageBitmap after that, you can then call
transferToImageBitmap(createImageBitmap is async so could introduce a tiny bit extra delay) - The MPMask code and ImageShaderContext code do have examples of shader applications, but honestly, since you're just wanting a simple passthrough shader, I'd guess that if you just search for a WebGL shader tutorial, that would probably give you better examples for your purposes; hopefully we'll have some better examples of this shortly
- Technically, there are some cases where you'll get better results on CPU (or even have to use CPU); for example, if you need to run 10 things at once, you'd want CPU ML inference for most of them, since that parallelizes better than GPU ML inference on browser.
@satoren @tyrmullen thank you for the detailed explanation. If getAsWebGLTexture() works better than getAs*Array() it would be great to have an example of its usage.
While it's straightforward to setup a shader to render a full-screen texture, the main problem is if you're receiving a WebGLTexture you need to use the same gl context that MediaPipe is using to generate the texture, but you can't interfere with any of the existing operations. I've played around with using result.confidenceMask.getGL() and attempted to cache the gl parameters with gl.getParameter(). After attempting to draw the WebGLTexture I re-apply the same parameters. I'm able to successfully draw a full-screen texture on its own, but when I try to use the same gl context I get errors of "gl.INVALID_OPERATION The specified command is not allowed for the current state." Without knowledge of which gl parameters to cache it's really tough to get a working example
@torinmb Perhaps a different Canvas was created for the second segment, so the gl context is also different. You can pass the canvas to the task creation options so that the gl context can be fixed.
I think everyone agrees that the migration from selfie_segmentation is difficult, so I think it would be a good idea to include a utility function in @mediapipe/tasks-vision to simplify the conversion to ImageBitmap.
I think everyone agrees that the migration from selfie_segmentation is difficult, so I think it would be a good idea to include a utility function in @mediapipe/tasks-vision to simplify the conversion to ImageBitmap.
@satoren @tyrmullen are you planning to include conversion to ImageBitmap in the next release? Or can you provide some working code example with getAsWebGLTexture? Thanks!
@volodymyrl Unfortunately I am not a person in google. But I could write example. @torinmb I hope this will be helpful to you.
Thanks @satoren this is so helpful!
@satoren Your example is great! Would you be able to port your logic and update the background replacement sample here? I think a lot of developers will benefit from it. https://github.com/googlesamples/mediapipe/tree/main/tutorials/background_segmenter
If you can, please send a pull request. Thanks!
How to mix video and background with the mask using webgl using the update ? This is what I am doing with selfie segmentation previously. ?
It's still doing cpu draw of video in that code. Can I use the returned texture in a seperate context for processing with video ?
My shader mix looked like this before with smoothing function as there is no smoothing postprocessing.
outColor = mix(bgTex, vec4(frameColor, 1.0), maskTex.r);
Rendering video with the mask in the same context almost works but can't get video to render yet either mixed or alone.
gl.activeTexture(gl.TEXTURE1)
gl.bindTexture(gl.TEXTURE_2D, texture)
gl.uniform1i(maskTextureLocation, 1)
gl.activeTexture(gl.TEXTURE0)
gl.bindTexture(gl.TEXTURE_2D, videoTexture)
gl.uniform1i(frameTextureLocation, 0)
gl.texImage2D(gl.TEXTURE_2D, 0, gl.RGB, gl.RGB, gl.UNSIGNED_BYTE, video);
```
```
float a = maskTex.r;
gl_FragColor = vec4(frameTex.rgb, a);
```
The deeplab model seems very inaccurate blocky edges and not applying smoothing.
@danrossi CanvasRenderingContext2D is executed on the GPU, so drawImage is also executed on the GPU. It is more efficient to blend once like your method, but not as important
Your question is a general question about sharing webgl textures on another canvas. I suggest you do a search on webgl.
I have this modification of yours. It's mixing all elements in webgl on that same offscreen context but unlike the segmentation api it's not blending yet the mask is cutting through the background, it's a similar shader to what I was doing before. My example is using low cpu and gpu.
So canvas drawImage of video is not cpu ?
There is this message about closing resources for memory leaks
"You seem to be creating MPMask instances without invoking .close(). This leaks resources."
https://codepen.io/danrossi/pen/yLGLmdv
Using the selfie segment model instead of deep labs the mix is working as before and using even lower resources. Not sure what the difference with the mask is and how to apply it different. It doesn't do softmax smoothing post processing like the deeplabs model has calculators compiled in. I have a shader method for that.
https://codepen.io/danrossi/pen/MWZYgKB
@danrossi
"You seem to be creating MPMask instances without invoking .close(). This leaks resources."
Oh, Thank you. We needed to explicitly close if I passed the canvas. My example is fixed
My example is using low cpu and gpu.
In my environment, the CPU utilization between my sample and yours was only about the same as the margin of error, but in what environment did you measure it?
My environment: Windows 11 Ryzen 7 4800U with Radeon graphics
My example
Your example
Windows 11 and RTX 2060. Just a Ryzen 5 3600. I'd rather mix elements in webgl than in the canvas as I'm already doing. But I had made a ticket there is a model calculator to mix the video, mask and background directly in wasm with smoothing ! It's not compiled in for the models. It would save all this work after.
But using the deeplabs isn't mixing the mask correctly it's cutting through to the background in the first example. Where the selfie segment model is masking the video correctly in the second example.
With deeplabs I'm not sure if smoothing is needed, it logs softmax is active, the selfie model does with blocky edges. But the deeplab model is showing parts in the video not of the body tracked, selfie segment doesn't I've noticed. It detects objects in the background like fabrics.
Update: I just noticed the canvas render output is square for the texture input using the offscreen canvas . I may have to grab the mask as a bitmap to a secondary webgl render sadly. So the viewport is correct.
I figured out what is going on with the deeplab model. The produced mask, the red channel is outside the mask. It's different to the selfie segment, the mask is red. Hence not displaying properly.
Correction if the source is 720p the resulting image from the offscreen render is 720p as long as the viewport is changed to the video dimension.
Inverting the r channel seems to work. But it shows this model is less accurate on the edges compared to the selfie model.
However the resulting mask from deeplab has too much edge compared to the selfie model which has little edge but needs softmax smoothing in the shader.
https://codepen.io/danrossi/pen/yLGLmdv
@satoren Nice example; thanks for writing and sharing this! Two quick things I noticed, in case they can be helpful:
-
There are two versions of segmentForVideo, one which uses a callback and one which does not. The callback-based version is more efficient (saves an extra copy), and doesn't require you to call
.close(). But the trade-off is that the WebGL texture will only be available during the callback. Since you're trying to either render right then or export-as-ImageBitmap for later, I think the callback version is probably better for you. -
If possible, I'd recommend switching to
transferToImageBitmapinstead ofcreateImageBitmap, since the latter is async and hence can sometimes introduce some small delays. Especially if you're already rendering to the canvas's main (null) framebuffer anyways (usually that's the most annoying part).
@danrossi For sharing textures between WebGL contexts, usually the best way is to use an ImageBitmap (you can use glTexImage2D to convert ImageBitmap back into a WebGLTexture). So @satoren's example above can be helpful for your use case as well.