UnityRenderStreaming icon indicating copy to clipboard operation
UnityRenderStreaming copied to clipboard

[BUG]: Working poorly with 2+ participants on Bidirectional Sample

Open avicarpio opened this issue 3 years ago • 10 comments

Package version

3.1.0-exp.4

Environment

* OS:  Ubuntu 22.04
* Unity version: 2020.3.35f1
* Graphics API: OpenGL
* Browser: Chrome

Steps To Reproduce

  1. Use Bidirectional Sample
  2. Connect more than 2 participants (inputs)

Current Behavior

With 2 participants, the CPU is at 20% that is a good value. But when the third or fourth participant enters the "call". The CPU reaches 100% load and the remote video gets super laggy.

Expected Behavior

Work with more than 2 participants. Maybe this is because it uses MCU, but you need a SFU for this. I have seen in the FAQ that SFU is not supported at the moment. Is there any plan to implement it in the near future? Is it very difficult to make it possible? Can I implement a SFU by forking your github repo or is it not possible?

Anything else?

No response

avicarpio avatar Oct 27 '22 10:10 avicarpio

I can also reproduce this issue. That's why I thought a SFU solution (either connecting to a SFU solution or writing one within URS) would alleviate the stress... But it's strange that only with 2-3 people the CPU goes up. Using 16 vCPUs.

eulersson avatar Oct 27 '22 10:10 eulersson

@avicarpio I would like to see the CPU loads on the Unity Profiler.

karasusan avatar Oct 28 '22 02:10 karasusan

Of course. What I attached is a Bidirectional sample scene with 4 contributors on localhost. This is running on a high end machine, so it should work flawlessly, but the result is ~10 FPS performance that gets worse during the time.

Specs:

  • CPU: Ryzen 7 2700x 16vCPU
  • GPU: RTX 3080Ti 12Gb
  • RAM: 32Gb DDR4

If you need me to do any more tests or if you need more information, just ask me. Thank you @karasusan.

URSProfilingCPUHighLoad.zip

htop screenshots:

  • With 2 contributors: Screenshot from 2022-10-28 09-32-39
  • With 3-4 contributors: Screenshot from 2022-10-28 09-31-33

avicarpio avatar Oct 28 '22 07:10 avicarpio

@avicarpio Thank you. I have an additional question. Can you see the loads on Profiler Window in Unity Editor? I would like to know which thread has a highest load. I assume that the highest loads is the encoder thread.

karasusan avatar Oct 31 '22 10:10 karasusan

@karasusan I sent you .zip with the profiler info, maybe you didn't notice it. There you can see that the problem is the RenderStreaming.Update() process (not sure tho). I tried deactivating all the decoders/encoders and the 100% CPU load was still there, so it doesn't seem to be a video encoding/decoding problem.

What I think is that if some users enters the Bidirectional sample, the remote video is what causes the bottleneck. Disabling that return video on the Unity side solves the lag. So the solution (not definitive) I found is creating 2 Signaling objects, and using different ports for them. One only for inputs, and other only for outputs. Also, making sure that the broadcast WebRTC (output port) is ONLY consumed by ONE user. Following that steps, the result is really fluid and I pass from having a scene with 3-4 contributors at 100% CPU load with ~10 FPS to ~35% CPU load with 60 FPS.

So, it seems that the cause is that URS is based on a MCU, but a SFU would be a better solution, or at least, be able to combine them. Being able to create a simple SFU server on the broadcast part and passing the GameView data as a contributor, then having the rest of the users consuming only that stream, could resolve the high CPU problems, and improve URS usability and performance.

Are you in agreement?

avicarpio avatar Oct 31 '22 11:10 avicarpio

@avicarpio How about the video resolution?

karasusan avatar Nov 01 '22 02:11 karasusan

I used 1920x1080, as it is the minimum resolution I need for my app

avicarpio avatar Nov 01 '22 08:11 avicarpio

@avicarpio It is so huge. Have you used hardware video codec?

karasusan avatar Nov 02 '22 02:11 karasusan

I am using URP + OpenGL on Linux x64 and only with Chrome browsers. I've tested on RTX 3080 Ti and Tesla T4 on AWS and I got the same result. Hardware encoding should be working on this environments, isn't it? The output codec is set on "default", but I've tested it with H264 ConstrainedBaseline and with VP8, same CPU load on both. As I said, disabling the video coding/encoding processes doesn't make the CPU load to decrease, because the bidirectional signaling is what causes the lag. If I set one signaling for inputs, and other for outputs, having two nodejs servers, the lag disappears...

Screenshot from 2022-11-02 09-09-25

Screenshot from 2022-11-02 09-26-41 Screenshot from 2022-11-02 09-26-53

avicarpio avatar Nov 02 '22 08:11 avicarpio

You can see the doc for video codec more detail. https://docs.unity3d.com/Packages/[email protected]/manual/video-streaming.html

If I set one signaling for inputs, and other for outputs, having two nodejs servers, the lag disappears...

It is weird... I believe signaling process shouldn't make the lag.

karasusan avatar Nov 08 '22 02:11 karasusan