mediapipe icon indicating copy to clipboard operation
mediapipe copied to clipboard

@mediapipe/tasks-vision background replacement quality

Open vlysytsia opened this issue 2 years ago • 34 comments

Hello, I rewrote @mediapipe/selfie_segmentation background replacement to @mediapipe/tasks-vision, and the quality of the background replacement is noticeably worse. I used the code from the official documentation as an example and changed the part responsible for segmentation - https://codepen.io/volodymyrl/pen/mdQKMdR?editors=0010. Is something wrong with my code, or is this how the model works(then why is it worse than deprecated @mediapipe/selfie_segmentation)?

vlysytsia avatar Jul 20 '23 11:07 vlysytsia

@volodymyrl,

Could you please share any reference example compare to new tasks vision regarding the quality that can be more helpful bring up to internal notice. Thank you

kuaashish avatar Jul 24 '23 11:07 kuaashish

Hey @kuaashish , thanks for your answer. This is an example of background replacement https://codepen.io/Guimauve01/pen/wvEaVrN with @mediapipe/selfie_segmentation

vlysytsia avatar Jul 25 '23 09:07 vlysytsia

You should use confidenceMask instead of categoryMask to make the edge smoother. Check out this example: https://codepen.io/khanhlvg/full/WNYaqNW

khanhlvg avatar Jul 27 '23 23:07 khanhlvg

Also, I had to write a GLSLShader to prevent GPU -> CPU transfers to get performance on par with selfie_segmentation.

satoren avatar Jul 28 '23 06:07 satoren

Hello @volodymyrl,

Could you please go though the above comment. Thank you

kuaashish avatar Jul 28 '23 09:07 kuaashish

@kuaashish @khanhlvg, thanks for the example. I used it to prepare another example that combines both versions https://codepen.io/volodymyrl/pen/VwVVjxd?editors=1111. @mediapipe/selfie_segmentation (on the left) and @mediapipe/tasks-vision (on the right), you can see that @mediapipe/tasks-vision is still worse(the image has torn edges).

@satoren, sorry, I am not familiar with GLSL Shaders. Can you please explain what I need to do to improve media pipe performance?

vlysytsia avatar Jul 28 '23 16:07 vlysytsia

The difference you're seeing mostly caused by the visualization logic. I tweaked the edge smoothening logic a bit to improve reduce the visibility of the edge. If you set minConfidence = 0 and maxConfidence = 1, you'll get the same result as the legacy selfie segmentation SDK. image

Besides, in the legacy SDK you're using the square input model (modelSelection: 0). If your input is landscape image, you should switch that to 1.

khanhlvg avatar Jul 28 '23 18:07 khanhlvg

@volodymyrl

Can you please explain what I need to do to improve media pipe performance?

I'm sorry can't post the code, but I hope it gives you a hint.

For best performance, I needed to get the mask as a mask image, like the selfie segmentation. See MPMask comments.

Here is the approach I took.

  1. Check the type that MPMask holds.

    • If hasUint8Array
      • new ImageData and copy to data from mask.getAsUint8Array()
      • Converted to ImageBitmap using createImageBitmap for use in CanvasRenderingContext2D drawImage
    • If hasFloat32Array
      • Almost the same as hasUint8Array, but the difference is that getAsFloat32Array is used.
    • If hasWebGLTexture
      • Use getAsWebGLTexture to get the texture and render it to Canvas using WebGL. The part converting to ImageBitmap with MPImage, a class similar to MPMask, may be helpful.
      • Converted to Canvas using createImageBitmap for use in CanvasRenderingContext2D drawImage.
  2. Use result ImageBitmap like segmentationMask in selfieSegmentation

Since the above steps are a bit cumbersome, I think one option is to continue using selfie segmentation until an easy-to-use environment is available.

This may be best discussed on #4491.

satoren avatar Jul 31 '23 01:07 satoren

This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.

github-actions[bot] avatar Aug 09 '23 01:08 github-actions[bot]

@satoren @khanhlvg, thanks for your answers! I updated the replace background function, and it looks acceptable, but still worse than @selfie_segmentation, not to mention how the background replacement works, for example, in Zoom. @satoren I can't use @selfie_segmentation because it has errors when you try to stop it, and the only suggestion to fix it was to upgrade to task-version (https://github.com/google/mediapipe/issues/3373).

vlysytsia avatar Aug 09 '23 09:08 vlysytsia

@khanhlvg,

Could you please look into this issue? Thank you

kuaashish avatar Aug 09 '23 10:08 kuaashish

@volodymyrl https://codepen.io/satoren/pen/rNQXRqp How about this? This is running on the CPU and not optimized for performance as here.

satoren avatar Aug 15 '23 05:08 satoren

Hey @satoren, thanks for your answer.

https://codepen.io/satoren/pen/rNQXRqp

In this example, the quality looks like the same as for @selfie_segmentation

How about this? This is running on the CPU and not optimized for performance as https://github.com/google/mediapipe/issues/4630#issuecomment-1657373951.

I use the second approach with getAsFloat32Array, but it works slow. To improve performance, I tried to use Web worker, but there are some limitations to the data you can post from the Web worker https://github.com/google/mediapipe/issues/4694.

vlysytsia avatar Aug 15 '23 13:08 vlysytsia

You can transfer by converting to ImageBitmap

satoren avatar Aug 15 '23 13:08 satoren

Also, using a webworker won't make it any faster. The bottleneck for this is the transfer from GPU to CPU. Try to find an efficient way to convert from WebGLTexture to ImageBitmap.

satoren avatar Aug 15 '23 13:08 satoren

+1 to everything satoren@ said :). In particular, generally try to keep things on GPU for best performance*. This is especially true with segmentation, since that can return an image and therefore can be run 100% on GPU, allowing for nice pipelining. So here's a quick summary for best performance for segmentation:

  • Run with the 'GPU' delegate
  • When running with 'GPU' delegate, avoid using any getAs*Array() calls; exclusively use getAsWebGLTexture() instead for best performance
  • Prefer confidence masks without category masks, if that works for your use case, as those are cheaper to compute, and provide more information
  • Using the above means you'll need to write a GL shader to render the texture to your OffscreenCanvas; if you need it as an ImageBitmap after that, you can then call transferToImageBitmap (createImageBitmap is async so could introduce a tiny bit extra delay)
  • The MPMask code and ImageShaderContext code do have examples of shader applications, but honestly, since you're just wanting a simple passthrough shader, I'd guess that if you just search for a WebGL shader tutorial, that would probably give you better examples for your purposes; hopefully we'll have some better examples of this shortly
  • Technically, there are some cases where you'll get better results on CPU (or even have to use CPU); for example, if you need to run 10 things at once, you'd want CPU ML inference for most of them, since that parallelizes better than GPU ML inference on browser.

tyrmullen avatar Aug 15 '23 22:08 tyrmullen

@satoren @tyrmullen thank you for the detailed explanation. If getAsWebGLTexture() works better than getAs*Array() it would be great to have an example of its usage.

vlysytsia avatar Aug 16 '23 08:08 vlysytsia

While it's straightforward to setup a shader to render a full-screen texture, the main problem is if you're receiving a WebGLTexture you need to use the same gl context that MediaPipe is using to generate the texture, but you can't interfere with any of the existing operations. I've played around with using result.confidenceMask.getGL() and attempted to cache the gl parameters with gl.getParameter(). After attempting to draw the WebGLTexture I re-apply the same parameters. I'm able to successfully draw a full-screen texture on its own, but when I try to use the same gl context I get errors of "gl.INVALID_OPERATION The specified command is not allowed for the current state." Without knowledge of which gl parameters to cache it's really tough to get a working example

torinmb avatar Aug 16 '23 20:08 torinmb

@torinmb Perhaps a different Canvas was created for the second segment, so the gl context is also different. You can pass the canvas to the task creation options so that the gl context can be fixed.

I think everyone agrees that the migration from selfie_segmentation is difficult, so I think it would be a good idea to include a utility function in @mediapipe/tasks-vision to simplify the conversion to ImageBitmap.

satoren avatar Aug 17 '23 00:08 satoren

I think everyone agrees that the migration from selfie_segmentation is difficult, so I think it would be a good idea to include a utility function in @mediapipe/tasks-vision to simplify the conversion to ImageBitmap.

@satoren @tyrmullen are you planning to include conversion to ImageBitmap in the next release? Or can you provide some working code example with getAsWebGLTexture? Thanks!

vlysytsia avatar Aug 17 '23 07:08 vlysytsia

@volodymyrl Unfortunately I am not a person in google. But I could write example. @torinmb I hope this will be helpful to you.

satoren avatar Aug 17 '23 09:08 satoren

Thanks @satoren this is so helpful!

torinmb avatar Aug 17 '23 16:08 torinmb

@satoren Your example is great! Would you be able to port your logic and update the background replacement sample here? I think a lot of developers will benefit from it. https://github.com/googlesamples/mediapipe/tree/main/tutorials/background_segmenter

If you can, please send a pull request. Thanks!

khanhlvg avatar Aug 17 '23 18:08 khanhlvg

How to mix video and background with the mask using webgl using the update ? This is what I am doing with selfie segmentation previously. ?

It's still doing cpu draw of video in that code. Can I use the returned texture in a seperate context for processing with video ?

My shader mix looked like this before with smoothing function as there is no smoothing postprocessing.

outColor = mix(bgTex, vec4(frameColor, 1.0), maskTex.r);

Rendering video with the mask in the same context almost works but can't get video to render yet either mixed or alone.

 gl.activeTexture(gl.TEXTURE1)
    gl.bindTexture(gl.TEXTURE_2D, texture)
    gl.uniform1i(maskTextureLocation, 1)

    gl.activeTexture(gl.TEXTURE0)
    gl.bindTexture(gl.TEXTURE_2D, videoTexture)
    gl.uniform1i(frameTextureLocation, 0)

    gl.texImage2D(gl.TEXTURE_2D, 0, gl.RGB, gl.RGB, gl.UNSIGNED_BYTE, video);
    ```
    
    ```
      float a = maskTex.r;
      gl_FragColor = vec4(frameTex.rgb, a);
    ```
    
    The deeplab model seems very inaccurate blocky edges and not applying smoothing. 
  

danrossi avatar Aug 19 '23 17:08 danrossi

@danrossi CanvasRenderingContext2D is executed on the GPU, so drawImage is also executed on the GPU. It is more efficient to blend once like your method, but not as important

Your question is a general question about sharing webgl textures on another canvas. I suggest you do a search on webgl.

satoren avatar Aug 20 '23 00:08 satoren

I have this modification of yours. It's mixing all elements in webgl on that same offscreen context but unlike the segmentation api it's not blending yet the mask is cutting through the background, it's a similar shader to what I was doing before. My example is using low cpu and gpu.

So canvas drawImage of video is not cpu ?

There is this message about closing resources for memory leaks

"You seem to be creating MPMask instances without invoking .close(). This leaks resources."

https://codepen.io/danrossi/pen/yLGLmdv

Using the selfie segment model instead of deep labs the mix is working as before and using even lower resources. Not sure what the difference with the mask is and how to apply it different. It doesn't do softmax smoothing post processing like the deeplabs model has calculators compiled in. I have a shader method for that.

https://codepen.io/danrossi/pen/MWZYgKB

danrossi avatar Aug 20 '23 09:08 danrossi

@danrossi

"You seem to be creating MPMask instances without invoking .close(). This leaks resources."

Oh, Thank you. We needed to explicitly close if I passed the canvas. My example is fixed

My example is using low cpu and gpu.

In my environment, the CPU utilization between my sample and yours was only about the same as the margin of error, but in what environment did you measure it?

My environment: Windows 11 Ryzen 7 4800U with Radeon graphics

My example スクリーンショット 2023-08-20 20 12 17

Your example スクリーンショット 2023-08-20 20 12 23

satoren avatar Aug 20 '23 11:08 satoren

Windows 11 and RTX 2060. Just a Ryzen 5 3600. I'd rather mix elements in webgl than in the canvas as I'm already doing. But I had made a ticket there is a model calculator to mix the video, mask and background directly in wasm with smoothing ! It's not compiled in for the models. It would save all this work after.

But using the deeplabs isn't mixing the mask correctly it's cutting through to the background in the first example. Where the selfie segment model is masking the video correctly in the second example.

With deeplabs I'm not sure if smoothing is needed, it logs softmax is active, the selfie model does with blocky edges. But the deeplab model is showing parts in the video not of the body tracked, selfie segment doesn't I've noticed. It detects objects in the background like fabrics.

Update: I just noticed the canvas render output is square for the texture input using the offscreen canvas . I may have to grab the mask as a bitmap to a secondary webgl render sadly. So the viewport is correct.

danrossi avatar Aug 20 '23 11:08 danrossi

I figured out what is going on with the deeplab model. The produced mask, the red channel is outside the mask. It's different to the selfie segment, the mask is red. Hence not displaying properly.

Correction if the source is 720p the resulting image from the offscreen render is 720p as long as the viewport is changed to the video dimension.

Inverting the r channel seems to work. But it shows this model is less accurate on the edges compared to the selfie model.

However the resulting mask from deeplab has too much edge compared to the selfie model which has little edge but needs softmax smoothing in the shader.

https://codepen.io/danrossi/pen/yLGLmdv

Screenshot 2023-08-20 221637 Screenshot 2023-08-20 221547

danrossi avatar Aug 20 '23 12:08 danrossi

@satoren Nice example; thanks for writing and sharing this! Two quick things I noticed, in case they can be helpful:

  • There are two versions of segmentForVideo, one which uses a callback and one which does not. The callback-based version is more efficient (saves an extra copy), and doesn't require you to call .close(). But the trade-off is that the WebGL texture will only be available during the callback. Since you're trying to either render right then or export-as-ImageBitmap for later, I think the callback version is probably better for you.

  • If possible, I'd recommend switching to transferToImageBitmap instead of createImageBitmap, since the latter is async and hence can sometimes introduce some small delays. Especially if you're already rendering to the canvas's main (null) framebuffer anyways (usually that's the most annoying part).

@danrossi For sharing textures between WebGL contexts, usually the best way is to use an ImageBitmap (you can use glTexImage2D to convert ImageBitmap back into a WebGLTexture). So @satoren's example above can be helpful for your use case as well.

tyrmullen avatar Aug 24 '23 23:08 tyrmullen