MediaPipeUnityPlugin
MediaPipeUnityPlugin copied to clipboard
Export CreateSyncToken from GlContext to csharp.
Feature Description
Memeber function CreateSyncTokenForCurrentExternalContext of GlContext is not exported to c# side, it is required for sync texture copy when directly use texture on GPU without readback.
Current Behaviour/State
Currently WebCamTexture will be readback to CPU and upload to GPU again to do the GPU calculation, but the WebCamTexture can be copied to TextureFrame directly on GPU side and feed to calculator directly on GPU side. It works on Android at least.
But without sync point get from Unity side, calculator won't wait the copy finished by Unity and will result huge flicking, there is a static member function of GlContext called CreateSyncTokenForCurrentExternalContext is designed for this usage, please export it to c# side.
Additional Context
Function position: https://github.com/google/mediapipe/blob/master/mediapipe/gpu/gl_context.h#L322
Could you please tell me the relevant code you're using to input data into the CalculatorGraph? (e.g. copy from WebCamTexture to TextureFrame, give it to the CalculatorGraph, etc...)
Could you please tell me the relevant code you're using to input data into the CalculatorGraph? (e.g. copy from
WebCamTexturetoTextureFrame, give it to theCalculatorGraph, etc...)
In ImageSourceSolution<T> function Run, check for GLES configType and use Graphics.ConvertTexture to do copy on GPU(Graphics.Copy will trigger in-place GPU readback on WebCamTexture), I could remove the WaitForEndOfFrame and got 27ms delay on my test device but result will flick, add this wait could stable the result and delay grow to 36ms, so I think I could make it wait on GPU side to reduce the delay.
// Copy current image to TextureFrame
if (graphRunner.configType == GraphRunner.ConfigType.OpenGLES)
{
textureFrame.ConvertTextureFrom(imageSource.GetCurrentTexture());
yield return new WaitForEndOfFrame();
}
else
{
ReadFromImageSource(imageSource, textureFrame);
}
After digging more about it, member function CreateSyncToken might be more suitable for this usage, since CreateSyncTokenForCurrentExternalContext won't switch context when creating Glsync.
I haven't investigated thoroughly yet, but I think at least the following two need to be ported to Unity.
std::shared_ptr<GlSyncPoint> GlContext::CreateSyncToken()void GlTextureBuffer::Updated(std::shared_ptr<GlSyncPoint>)
I'm not sure if it's enough to just generate a GlSyncToken.
I currently use GlContext::CreateSyncToken() and void GlTextureBuffer::Updated(std::shared_ptr<GlSyncPoint>) to set the sync point, but it's more complex than I think before using it.
- Graphics api calls only happens on rendering thread, so we need call
GL.IssuePluginEventand set sync point in rendering thread. It introduces 2ms delay on my device so a native rendering plugin might needed. - MediaPipe use a dedicated thread to do all gl api call, so
WaitUntilRelease,CreateSyncTokenwill be blocked heavily when the dedicated thread is busy dispatching compute or reading the result back from GPU. On my device, dispatch or readback may block the thread for 20ms so theWaitUntilReleaseandCreateSyncTokenwill block the MainThread the same time in worst case. WaitUntilReleaseblocking issue existing in current code andCreateSyncTokenwill increase the possible of hitting block.
Hope I could find a way to solve those issue.
Finally, I managed to make it working.
- I use this implementation to insert sync point:
MpReturnCode mp_GlTextureBuffer__InsertProducerSyncPoint(mediapipe::GlTextureBuffer* gl_texture_buffer) {
TRY_ALL
const auto& producerContext = gl_texture_buffer->GetProducerContext();
if (producerContext) {
gl_texture_buffer->Updated(mediapipe::GlContext::CreateSyncTokenForCurrentExternalContext(producerContext));
}
RETURN_CODE(MpReturnCode::Success);
CATCH_ALL
}
- Call
InsertProducerSyncPointin RenderingThread in Unity for make sure it happens after the actual copy command. - I need use
GlExternalFenceSyncPointrather thanGlFenceSyncPoint. The sync point is created directly from context of Rendering Thread, and no need to wait context switching or thread switching. - Call
WaitUntilReleaseofTextureFrameinOnTextureFrameReleaseto avoid blocking on MainThread.
Calling InsertProducerSyncPoint by GL.IssuePluginEvent is little ugly I think, but it cost 1ms to wait and is easier to make a native rendering plugin, so I just let it be.
After done all of this, the GPU readback of WebCamTexture on MainThread is gone, and flicks are also gone. But the latency grows to 40ms, some of them are missing parts of WaitUntilRelease which wasn't included before, so even in pure GPU method, the latency won't be much better than normal GPU method :(
By the way, does it run faster when calling AddPacketToInputStream from the callback of AsyncGPUReadBack ?
https://github.com/homuler/MediaPipeUnityPlugin/blob/9e2d60132659ad7c24cc7442a02c5de63d37975a/Packages/com.github.homuler.mediapipe/Runtime/Scripts/Unity/Experimental/TextureFrame.cs#L130
By the way, does it run faster when calling
AddPacketToInputStreamfrom the callback ofAsyncGPUReadBack?https://github.com/homuler/MediaPipeUnityPlugin/blob/9e2d60132659ad7c24cc7442a02c5de63d37975a/Packages/com.github.homuler.mediapipe/Runtime/Scripts/Unity/Experimental/TextureFrame.cs#L130
I think no, I have tested not set the sync point, the latency is same to set a sync point, the my device is stronge enough to run 60 FPS on MainThread, so the readback on MainThread won't let time cost beyond 16ms, it won't affect latency.
The bottleneck is compute dispatch and result readback on mediapipe thread.
I got your point, async readback is sure a better way to reduce MainThread blocks in the existing GPU method(requires a readback).
After profiled the GPU render stage, my device will use around 30ms to finish compute shader for pose landmark, so it might has no way to reduce latency unless use faster model(I already use the lite model with 320*240 camera texture resolution).
Conclusion:
Export CreateSyncTokenForCurrentExternalContext, but the newly created shared_ptr might be difficult to manage lifecycle at C# side, so I directly export a new interface in GlTextureBuffer to simplify the usage like code above.
The latency might be improved if the readback is too slow in current GPU method on your device, or be simliar if readback is quick enough.
By the way, trigger the AddPacketToInputStream at RenderPipelineManager.beginContextRendering (URP), will get the camera texture earlier since the camera texture is updated in PostLateUpdate in Unity. Currently using WaitForEndOfFrame will let rendering logic finished before we call AddPacketToInputStream.