obs-studio icon indicating copy to clipboard operation
obs-studio copied to clipboard

UHD output performance to UltraStudio 4K Mini is unusable on macOS Intel, perfect on M1

Open PhotoJoseph opened this issue 3 years ago • 29 comments

Operating System Info

macOS 12

Other OS

No response

OBS Studio Version

Other

OBS Studio Version (Other)

27

OBS Studio Log URL

https://obsproject.com/logs/9PmYBd-le9uIxrSR

OBS Studio Crash Log URL

No response

Expected Behavior

Smooth rendering of key/fill playback on output

Current Behavior

Stuttered output that drops in performance quickly. Set to 29.97 it's playing maybe 20 fps but quickly drops to 1 fps and lower.

Steps to Reproduce

  1. macOS 12.4 Intel Mac Pro, OBS 27.2.4
  2. install thunderbolt Blackmagic UltraStudio 4K Mini,
  3. Set OBS to 3840x2160 29.97 canvas and output, enable RGB in advanced settings
  4. enable keyed output for UltraStudio. Play something that has alpha channel, and motion
  5. Performance starts slow, and gets worse. Totally unusable.

Anything else we should know?

  1. Exact same setup (Intel Mac Pro) but using Decklink 8K card instead of UltraStudio works perfectly fine
  2. Exact same setup (UltraStudio 4K Mini) but using M1 MacBook Air instead of Mac Pro works perfectly fine

PhotoJoseph avatar Oct 07 '22 01:10 PhotoJoseph

Just to point out the inevitable: Given that reproducing this issue requires access to pretty expensive hardware, it might take some time before someone with the exact same setups available might be able to tackle this issue - hence there might be quite some inactivity before this might be fixed.

PatTheMav avatar Oct 09 '22 12:10 PatTheMav

Understood. I could potentially allow remote access to my system for testing if someone knows what to look for.

PhotoJoseph avatar Oct 09 '22 14:10 PhotoJoseph

I believe @Fenrirthviti has both an UltraStudio 4K Mini and an Intel Mac (but no M1 machine).

derrod avatar Oct 09 '22 14:10 derrod

I do have the UltraStudio, but as mentioned, I only have access to an Intel mac at present.

Fenrirthviti avatar Oct 13 '22 03:10 Fenrirthviti

That is where the problem is; on Intel. It works fine on M1.

PhotoJoseph avatar Oct 13 '22 03:10 PhotoJoseph

I'm testing the other issue in the next week or so, so I'll run through this one at the same time.

Fenrirthviti avatar Oct 13 '22 03:10 Fenrirthviti

Can confirm this issue also exists in some form in Windows with the Ultrastudio 4K Mini. 1080p60 Decklink output works perfectly, but 2160p60 results in a garbled picture. Rolled all the way back to OBS 29.0.2 and the issue is still present.

polyh3dron avatar Jul 07 '23 07:07 polyh3dron

Approaching a year since this issue was introduced. Still no movement?

polyh3dron avatar Aug 04 '23 09:08 polyh3dron

Apologies, I thought I had provided an update but got pulled away on other issues and forgot about this.

I wasn't able to replicate this issue, but I also don't have enough understanding on what is actually expected to happen, or how to investigate further myself. I am either doing something wrong in the test that I don't understand, or am not experiencing this issue myself.

Fenrirthviti avatar Aug 07 '23 03:08 Fenrirthviti

Are you saying you are able to use the Decklink Output at 2160p60 with version 29.1.3? Version 28.1.2 technically "works" for me, albeit with an insane amount of encoding lag even though neither CPU or GPU are under full load. Version 29.1.3 outputs a corrupted image through the Decklink Output at 2160p60, although the encoding lag is no longer a problem. 1080p60 works fine. 2160p60 Decklink Output seems to have been broken after version 28. Every Decklink Output Mode up to 2kp60 DCI works, 2160p60 and 4kp60DCI output a corrupted image.

polyh3dron avatar Aug 07 '23 05:08 polyh3dron

Here's the image with the Decklink Output setting at 1080p60:

https://i.imgur.com/NeZEAwD.jpg

Here's the same image with the Decklink Output setting at 2160p60:

https://i.imgur.com/5w8xhkZ.jpg

polyh3dron avatar Aug 07 '23 05:08 polyh3dron

I've tested again, and I can't get 2160p60 to work in any version of OBS, it always shows a garbled output.

1080p60 appears to work ok.

My input source is a PS5, just for reference.

Fenrirthviti avatar Aug 08 '23 00:08 Fenrirthviti

I've tested again, and I can't get 2160p60 to work in any version of OBS, it always shows a garbled output.

1080p60 appears to work ok.

My input source is a PS5, just for reference.

Try rolling back to OBS 28.1.2. You should get a proper image, but also get crippling encoding lag.

polyh3dron avatar Aug 08 '23 16:08 polyh3dron

Now that we've confirmed this is a real issue, how do we get the necessary attention to get it fixed?

polyh3dron avatar Aug 09 '23 20:08 polyh3dron

As with anything, someone with the requisite experience and knowledge, time, and desire to work on it. Unfortunately, that is not me, as I don't have any experience with the DeckLink SDK or development in this space of the program.

Fenrirthviti avatar Aug 09 '23 20:08 Fenrirthviti

Apparently @DDRBoxman is the author of this part of OBS Studio. Tagging him here for some visibility.

polyh3dron avatar Aug 09 '23 21:08 polyh3dron

2160p60 Decklink Output with an Ultrastudio 4K Mini is still broken in OBS 30 Beta 1. Tagging @jpark37

polyh3dron avatar Aug 18 '23 04:08 polyh3dron

I have an Intel Mac Pro but not a 4K studio mini so I can’t test this setup.

DDRBoxman avatar Aug 18 '23 16:08 DDRBoxman

I have an Intel Mac Pro but not a 4K studio mini so I can’t test this setup.

I've also repro'd this on a Windows PC. @Fenrirthviti has repro'd it as well as seen in the messages here.

polyh3dron avatar Aug 30 '23 06:08 polyh3dron

I also have an Ultra Studio 4K Mini and cannot get it to output a 2160p60 signal. The display on the device shows a distorted image. 1080p60 works as mentioned before. I get the same result with a MacBook Pro M2 Pro, a Mac Studio M1 Max or a Lenovo Thinkpad X1 Carbon. The OBS versions I tried are 29.1 and 28.1. What can I do to help solve this issue?

P.S. I also tried a different software (Mimolive) with a similar result, so this might be connected to the BMD driver itself. Unfortunately I have tested a few driver versions but never had any success and since my Ultra Studio is rather new I have never seen a working combination that I can revert to.

kallegrabowski69 avatar Oct 04 '23 14:10 kallegrabowski69

Same issue here! I don't understand how it can work on other programs (like pro tools, premiere, Final Cut, etc.) and not on others. I can only guess OBS and other broken apps use an older BMD SDK version that did not support 2160p30 and above.

@polyh3dron seems totally weird to have it work on v28.1.2. Others including me had no luck with it.

miagg avatar Mar 12 '24 13:03 miagg

Sadly OBS 30.1.0 does not include a fix. I'm pretty convinced the culprit is ffmpeg library which is used to output to decklink devices. Maybe different flags are needed to switch to quad link SDI (12G) or even a new binary?

miagg avatar Mar 14 '24 16:03 miagg

Okay so this has turned into two issues, I'm making a new issue for the distorted image: https://github.com/obsproject/obs-studio/issues/10380

Performance issues should stay on this existing thread.

DDRBoxman avatar Mar 15 '24 04:03 DDRBoxman

We're hitting a performance bottleneck here when we download the texture after the GPU scale which shows up with larger resolutions.

https://github.com/obsproject/obs-studio/blob/21f1c155ef33f176c4065868a6edc7951708ee49/UI/frontend-plugins/decklink-output-ui/decklink-ui-main.cpp#L420

DDRBoxman avatar Mar 17 '24 21:03 DDRBoxman

I'm going to try to see if I can get some sort of texture ping pong setup working here so we aren't blocking on this call

DDRBoxman avatar Mar 20 '24 02:03 DDRBoxman

We're already doing that, not sure why but the texture download off the GPU seems to be blocking when it should be async 🤔

DDRBoxman avatar Mar 29 '24 18:03 DDRBoxman

I may have run into a similiar situation on Linux. Though, in my case the performance bottleneck is at the memcpy after the texture is mapped. The memcpy takes 3ms, which seems rather slow on a PCIe 3.0 GPU for a 1920x1080 image. My OpenGL is not that fluid, but either that way of downloading textures is not efficient, or there are outstanding GPU commands, which have to complete before.

If there is no way to speed up this blocking copy, this should happen asynchronously. But besides: Is it really necessary to download the texture from GPU here? Isn't this already done by libobs somewhere else in order to feed normal video output plugins?

ipatix avatar May 09 '25 14:05 ipatix

Okay, I did some more experiments. So it doesn't appear to be an issue with outstanding GPU commands or anything. If I only copy half the data, the copy will be twice as fast. So my conclusion is that for whatever reason reading from the mapped texture memory is just slow. Perhaps I'll try to use a normal glGetTexImage to see if it's faster.

Edit: Okay, I've tried glGetTexImage. But it is only marginally faster (~0.5 ms improvement). So I still have no idea for a good solution.

Edit 2: Okay, interestingly Windows doesn't appear to have this issue. Or at least the reading after ID3D11DeviceContext::Map is about 3 to 4 times as fast with glMapBuffer (on the same hardware). Not sure what different behavior we are running into there.

ipatix avatar May 10 '25 21:05 ipatix

There are quite a few copies involved:

  • First a frame from the video cache is copied into the output_frame
  • Then the rendered texture is copied into a stage texture (with the expectation that the GPU will do that copy immediately, which will cause trouble at some point)
  • Then the texture data is downloaded into a buffer (either a PIXEL_PACK_BUFFER in OpenGL or as a D3D11_MAPPED_SUBRESOURCE in Direct3D)
  • In theory this mapping (and thus download of texture data) should not block because a separate stage texture was created which should neither be sampled from in any shader or written to by any draw calls, though maybe it is blocked until the copy operation has actually taken place (see above - might be API specific)
  • And finally the texture data is copied into the output_frame again

Because the pointers (and associated memory) holding the texture data are created by the graphics APIs, they are owners of that data, and once gs_stagesurface_unmap is called, the APIs can and will deallocate the memory at some point. So that final copy is required and cannot be avoided, but the first copy (of the cached frame data into the new frame) is unnecessary as the data is supposed to be fully replaced by the staged texture data anyway.

PatTheMav avatar May 28 '25 18:05 PatTheMav