server icon indicating copy to clipboard operation
server copied to clipboard

Bug with latest AMD PRO drivers 22Q4

Open sandercox opened this issue 2 years ago • 9 comments

Expected behaviour

Just works.

Current behaviour

Crashes on the first call to caspar::accelerator::ogl::texture::impl::copy_to


Steps to reproduce

  1. Install AMD Pro drivers 22Q4
  2. Start CasparCG server and load a color (LOAD RED)

Environment

  • Commit: 2.3.0, 2.3.3 and master
  • Server version: [e.g. v2.2]
  • Operating system: [e.g. Windows 11]

I played a bit with the code here and it seems that its due to the format / type not working properly.

When I change format to GL_RGBA and type to GL_UNSIGNED_BYTE I get results but Red and Green channels are swapped. But any call with GL_BGRA

Guess it's a driver issue but maybe there is a workaround in CasparCG. Reported that with the AMD driver software 🤞🏼

sandercox avatar Nov 29 '22 13:11 sandercox

Just want to bup this issue, it still occurs with the latest AMD drivers. Effectivly CasparCG is unusable right now with an AMD graphics card if the drivers are newer than 2020 or so.

jpc0 avatar May 29 '23 10:05 jpc0

Seems fine on ubuntu 22.04 OpenGL 4.6 (Core Profile) Mesa 22.2.5 AMD using the onboard gpu from a 7950x is running without issue.
Unless I install windows on this machine to figure out this one bug, I am not able to do anything on this myself.

When I change format to GL_RGBA and type to GL_UNSIGNED_BYTE I get results but Red and Green channels are swapped. But any call with GL_BGRA

I wonder if both of those changes are necessary?
changing GL_BGRA to GL_RGBA will likely have large implications elsewhere. Such as the decklink driver accepts BGRA or ARGB, so while we could probably composite in, we would have to convert it to BGRA at some point.

It sounds like GL_UNSIGNED_INT_8_8_8_8_REV vs GL_UNSIGNED_BYTE could have no impact. Based on https://stackoverflow.com/questions/7786187/opengl-texture-upload-unsigned-byte-vs-unsigned-int-8-8-8-8, it looks like it is a performance optimisation, but as all the architectures we may want to run on are little-endian, changing it might have no effect? https://github.com/renpy/renpy/issues/16 backs up that suspicion of being a performance optimisation, the one source link still working (apple) says that GL_RGBA and GL_UNSIGNED_BYTE, but doesnt say if that extends to GL_BGRA

So if someone can confirm whether this works with GL_BGRA and GL_UNSIGNED_BYTE on these AMD GPUs, then it should be possible to make that change.

Julusian avatar May 29 '23 11:05 Julusian

So this breaks on windows, changing the texture type to GL_RGBA, including in the screen consumer to fix the colours to GL_RGBA while I was testing seems to fix this. I came to ask if there is any reason why GL_BGRA is used but you have answered that.

Changing to GL_UNSIGNED_BYTE from GL_UNSIGNED_INT_8_8_8_8_REV does not solve the problem. It does seem like the AMD driver is bugged only with GL_BGRA and from what I can tell GL_BGR as well.

jpc0 avatar May 29 '23 14:05 jpc0

To note there isn't an opengl error thrown, the AMD driver itself throws an exception.

jpc0 avatar May 29 '23 15:05 jpc0

I am testing on Windows 11, it was the same on windows 10 when I was still running that.

jpc0 avatar May 29 '23 15:05 jpc0

To note, just changing to GL_RGBA and GL_RGB fixed the crash for me, I did not need to change GL_UNSIGNED_INT_8_8_8_8_REV

jpc0 avatar May 29 '23 15:05 jpc0

I have also tried GetTexInfo and Getntexinfo? the other two APIs and they also crash.

Seems like something deep in the AMD driver is broken with BGRA on windows.

Will want to check if writing to the texture crashes as well.

Will check that and will see if there is any other way to do the copy.

Maybe the OpenCL dream may yet come

jpc0 avatar May 29 '23 15:05 jpc0

So it's only these calls that fail https://registry.khronos.org/OpenGL-Refpages/gl4/html/glGetTexImage.xhtml, the glTextureSubImage2D call was fine.

jpc0 avatar May 29 '23 19:05 jpc0

Confirming that this is apparently still an issue, although all I can see is that CasparCG quits suddenly and Windows Event Viewer shows the following. Happens no matter what the consumer is and no matter if it's a media file or simply instructing CasparCG to output a colour. At least for media I can see that ffmpeg gets to the point where it probably would start outputting image data and bang, it's quit. I can't get any more debug info out of CasparCG at this time:

Faulting application name: casparcg.exe, version: 2.3.2.0, time stamp: 0x604fb45a Faulting module name: ntdll.dll, version: 10.0.19041.3393, time stamp: 0xfeef31d3 Exception code: 0xc0000374

My system:

  • Windows 10 IoT 21H2 LTSC
  • Ryzen 5 2400G with AMD Radeon(TM) RX Vega 11 Graphics
  • Driver 23.19.02-230831a-396094C-AMD-Software-Adrenalin-Edition

Full version details:

APU - AMD Radeon(TM) RX Vega 11 Graphics - Primary/Integrated VRAM - 2048 MB - DDR4 1467 MHz Driver Version - 23.19.02-230831a-396094C-AMD-Software-Adrenalin-Edition AMD Windows Driver Version - 31.0.21902.5 Direct3D API Version - 12.1 Vulkan™ API Version - 1.3.260 OpenCL™ API Version - 2.0 OpenGL® API Version - 4.6 Direct3D® Driver Version - 9.14.10.01526 Vulkan™ Driver Version - 2.0.279 OpenCL® Driver Version - 31.0.21902.5 OpenGL® Driver Version - 23.08.230729_569461f 2D Driver Version - 8.1.1.1634 2D Driver File Path - /REGISTRY/MACHINE/SYSTEM/CurrentControlSet/Control/Class/{4d36e968-e325-11ce-bfc1-08002be10318}/0000 UI Version - 2023.0831.1020.1996 AMD Audio Driver Version - 10.0.1.23 Driver Provider - Advanced Micro Devices, Inc. Windows Edition - Windows 10 EnterpriseSN (64 bit) Windows Version - 21H2

Using GLView I can see GL_EXT_abgr" and "GL_EXT_bgra" listed under extensions, but how that relates to anything I have no idea.

rob-fi avatar Oct 05 '23 20:10 rob-fi