MediaPipeUnityPlugin Export CreateSyncToken from GlContext to csharp.

Feature Description

Memeber function CreateSyncTokenForCurrentExternalContext of GlContext is not exported to c# side, it is required for sync texture copy when directly use texture on GPU without readback.

Current Behaviour/State

Currently WebCamTexture will be readback to CPU and upload to GPU again to do the GPU calculation, but the WebCamTexture can be copied to TextureFrame directly on GPU side and feed to calculator directly on GPU side. It works on Android at least. But without sync point get from Unity side, calculator won't wait the copy finished by Unity and will result huge flicking, there is a static member function of GlContext called CreateSyncTokenForCurrentExternalContext is designed for this usage, please export it to c# side.

Additional Context

Function position: https://github.com/google/mediapipe/blob/master/mediapipe/gpu/gl_context.h#L322

Mar 01 '24 10:03 leozzyzheng

Could you please tell me the relevant code you're using to input data into the CalculatorGraph? (e.g. copy from WebCamTexture to TextureFrame, give it to the CalculatorGraph, etc...)

Mar 02 '24 05:03 homuler

Could you please tell me the relevant code you're using to input data into the CalculatorGraph? (e.g. copy from WebCamTexture to TextureFrame, give it to the CalculatorGraph, etc...)

In ImageSourceSolution<T> function Run, check for GLES configType and use Graphics.ConvertTexture to do copy on GPU(Graphics.Copy will trigger in-place GPU readback on WebCamTexture), I could remove the WaitForEndOfFrame and got 27ms delay on my test device but result will flick, add this wait could stable the result and delay grow to 36ms, so I think I could make it wait on GPU side to reduce the delay.

        // Copy current image to TextureFrame
        if (graphRunner.configType == GraphRunner.ConfigType.OpenGLES)
        {
          textureFrame.ConvertTextureFrom(imageSource.GetCurrentTexture());
          yield return new WaitForEndOfFrame();
        }
        else
        {
          ReadFromImageSource(imageSource, textureFrame);
        }

Mar 04 '24 01:03 leozzyzheng

After digging more about it, member function CreateSyncToken might be more suitable for this usage, since CreateSyncTokenForCurrentExternalContext won't switch context when creating Glsync.

Mar 04 '24 03:03 leozzyzheng

I haven't investigated thoroughly yet, but I think at least the following two need to be ported to Unity.

I'm not sure if it's enough to just generate a GlSyncToken.

Mar 05 '24 12:03 homuler

I currently use GlContext::CreateSyncToken() and void GlTextureBuffer::Updated(std::shared_ptr<GlSyncPoint>) to set the sync point, but it's more complex than I think before using it.

Graphics api calls only happens on rendering thread, so we need call GL.IssuePluginEvent and set sync point in rendering thread. It introduces 2ms delay on my device so a native rendering plugin might needed.
MediaPipe use a dedicated thread to do all gl api call, so WaitUntilRelease, CreateSyncToken will be blocked heavily when the dedicated thread is busy dispatching compute or reading the result back from GPU. On my device, dispatch or readback may block the thread for 20ms so the WaitUntilRelease and CreateSyncToken will block the MainThread the same time in worst case.
WaitUntilRelease blocking issue existing in current code and CreateSyncToken will increase the possible of hitting block.

Hope I could find a way to solve those issue.

Mar 05 '24 13:03 leozzyzheng

Finally, I managed to make it working.

I use this implementation to insert sync point:

MpReturnCode mp_GlTextureBuffer__InsertProducerSyncPoint(mediapipe::GlTextureBuffer* gl_texture_buffer) {
  TRY_ALL
    const auto& producerContext = gl_texture_buffer->GetProducerContext();
    if (producerContext) {
      gl_texture_buffer->Updated(mediapipe::GlContext::CreateSyncTokenForCurrentExternalContext(producerContext));
    }

    RETURN_CODE(MpReturnCode::Success);
  CATCH_ALL
}

Call InsertProducerSyncPoint in RenderingThread in Unity for make sure it happens after the actual copy command.
I need use GlExternalFenceSyncPoint rather than GlFenceSyncPoint. The sync point is created directly from context of Rendering Thread, and no need to wait context switching or thread switching.
Call WaitUntilRelease of TextureFrame in OnTextureFrameRelease to avoid blocking on MainThread.

Calling InsertProducerSyncPoint by GL.IssuePluginEvent is little ugly I think, but it cost 1ms to wait and is easier to make a native rendering plugin, so I just let it be.

After done all of this, the GPU readback of WebCamTexture on MainThread is gone, and flicks are also gone. But the latency grows to 40ms, some of them are missing parts of WaitUntilRelease which wasn't included before, so even in pure GPU method, the latency won't be much better than normal GPU method :(

Mar 06 '24 13:03 leozzyzheng

By the way, does it run faster when calling AddPacketToInputStream from the callback of AsyncGPUReadBack ? https://github.com/homuler/MediaPipeUnityPlugin/blob/9e2d60132659ad7c24cc7442a02c5de63d37975a/Packages/com.github.homuler.mediapipe/Runtime/Scripts/Unity/Experimental/TextureFrame.cs#L130

Mar 06 '24 13:03 homuler

By the way, does it run faster when calling AddPacketToInputStream from the callback of AsyncGPUReadBack ?

https://github.com/homuler/MediaPipeUnityPlugin/blob/9e2d60132659ad7c24cc7442a02c5de63d37975a/Packages/com.github.homuler.mediapipe/Runtime/Scripts/Unity/Experimental/TextureFrame.cs#L130

I think no, I have tested not set the sync point, the latency is same to set a sync point, the my device is stronge enough to run 60 FPS on MainThread, so the readback on MainThread won't let time cost beyond 16ms, it won't affect latency.

The bottleneck is compute dispatch and result readback on mediapipe thread.

Mar 06 '24 14:03 leozzyzheng

I got your point, async readback is sure a better way to reduce MainThread blocks in the existing GPU method(requires a readback).

Mar 06 '24 14:03 leozzyzheng

After profiled the GPU render stage, my device will use around 30ms to finish compute shader for pose landmark, so it might has no way to reduce latency unless use faster model(I already use the lite model with 320*240 camera texture resolution).

Mar 07 '24 08:03 leozzyzheng

Conclusion: Export CreateSyncTokenForCurrentExternalContext, but the newly created shared_ptr might be difficult to manage lifecycle at C# side, so I directly export a new interface in GlTextureBuffer to simplify the usage like code above.

The latency might be improved if the readback is too slow in current GPU method on your device, or be simliar if readback is quick enough.

By the way, trigger the AddPacketToInputStream at RenderPipelineManager.beginContextRendering (URP), will get the camera texture earlier since the camera texture is updated in PostLateUpdate in Unity. Currently using WaitForEndOfFrame will let rendering logic finished before we call AddPacketToInputStream.

Mar 07 '24 13:03 leozzyzheng

MediaPipeUnityPlugin MediaPipeUnityPlugin copied to clipboard

Export CreateSyncToken from GlContext to csharp.

Feature Description

Current Behaviour/State

Additional Context

MediaPipeUnityPlugin
MediaPipeUnityPlugin copied to clipboard