compute-runtime icon indicating copy to clipboard operation
compute-runtime copied to clipboard

Hardware Accelerated Tonemapping for Tiger Lake broken after 21.49.21786

Open 88fingerslukee opened this issue 2 years ago • 14 comments

I'm running a Plex on an Ubuntu docker image which is running on an i7-1165G7.

Every version of the drivers after 21.49.21786 breaks HW HDR->SDR tonemapping. I am not really sure what other info to provide besides this but I can give whatever is needed to help here.

88fingerslukee avatar Jan 14 '22 18:01 88fingerslukee

Same issue for me. Tested all version above until the latest one. But latest working version is 21.49.21786.

System vmware esxi 7 host with cpu Intel(R) Core(TM) i7-9700K gpu passthrough to Ubuntu server VM running version 20.04.3 LTS (fully updated) plexmediaserver 1.25.3.5409

poneli avatar Jan 18 '22 11:01 poneli

Try using official Intel's repo https://dgpu-docs.intel.com/installation-guides/ubuntu/ubuntu-focal.html

sudo apt-get install -y gpg-agent wget
wget -qO - https://repositories.intel.com/graphics/intel-graphics.key |
  sudo apt-key add -
sudo apt-add-repository \
  'deb [arch=amd64] https://repositories.intel.com/graphics/ubuntu focal main'

Then

sudo apt-get update
sudo apt-get install \
  intel-opencl-icd \
  intel-level-zero-gpu level-zero \
  intel-media-va-driver-non-free libmfx1

Ge082 avatar Jan 20 '22 20:01 Ge082

this doesn't work. It still has the same error. The only thing that works is downgrading to 21.49.21786

88fingerslukee avatar Jan 24 '22 19:01 88fingerslukee

I have the same problem and likewise my solution so far has been to downgrade to 21.49.21786.

alanshelley avatar Jan 25 '22 03:01 alanshelley

Any updates to this issue?

kevindd992002 avatar Mar 14 '22 23:03 kevindd992002

Can anyone please give any updates to this issue? Is it fixed with the latest version of the runtime libraries?

kevindd992002 avatar Apr 26 '22 09:04 kevindd992002

Give it a try and report back.

poneli avatar Apr 27 '22 08:04 poneli

I already found a Plex thread mentioning that it is still not fixed.

kevindd992002 avatar Apr 27 '22 11:04 kevindd992002

I was able to reproduce the problem locally using NUC with KBL. I see hardware accelerated transcoding with driver 21.49.21786 where CPU usage is about 40% and very slow software transcoding with driver 21.50.21939 with CPU usage at about 350% (4 cores). Based on that observation I isolated regression to commit https://github.com/intel/compute-runtime/commit/34d9d9b0d389077a2df5434dd9277e8f257a8568 which is "gmmlib revision update". What is important this commit changes libigdgmm version on dependency list from version 11 to 12. It was not clear why it has broken Neo in Plex so I created a wrapper on "/usr/lib/plexmediaserver/Plex Transcoder" to execute it with strace. It turned out that Plex uses its own copy of libigdgmm.so:

open("/usr/lib/plexmediaserver/lib/dri/../libigdgmm.so.plex", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 5

and most probably, as it is older version, it is no longer compatible with Neo (libigdrcl.so). So the process "Plex Transcoder" first loads libigdgmm.so.plex and then Neo (libigdrcl.so) tries to load libigdgmm.so.12 and they most probably collide. I did quick experiment that I renamed libigdgmm.so.plex and linked libigdgmm.so.12 to it:

sudo mv /usr/lib/plexmediaserver/lib/libigdgmm.so.plex /usr/lib/plexmediaserver/lib/libigdgmm.so.plex_ORIG
sudo ln -s /usr/local/lib/libigdgmm.so.12 /usr/lib/plexmediaserver/lib/libigdgmm.so.plex

and OpenCL accelerated tone mapping works again. So the rootcause is conflicting libigdgmm.so libraries loaded in single process and I don't see how this could be solved in Neo. Maybe Plex should use libigdgmm which is already installed in the system instead of using its own copy?

pwilma avatar Jun 08 '22 15:06 pwilma

Plex forum thread about this issue: https://forums.plex.tv/t/anyone-have-been-able-to-hw-transcode-on-an-intel-nuc-11-iris-xe/695381/505

pwilma avatar Jul 15 '22 13:07 pwilma

This might only be because it's dockerized but this was manifesting as segfaults for me:

Plex Transcoder[4286]: segfault at 7fd216bf46b8 ip 00007fcb62905818 sp 00007ffd972c3050 error 4 in libigdgmm.so.plex[7fcb62877000+90000]

The magic fix running Ubuntu 20.04 (focal) and PMS in Docker (plexinc/pms-docker) was a combo of this set of packages (from the Intel apt repo):

apt-get install -y \
    intel-igc-cm=1.0.128+i699.3~u20.04 \
    intel-opencl-icd=21.49.21786+i643~u20.04 \
    libigc1=1.0.10409+i699.3~u20.04 \
    libigdfcl1=1.0.10409+i699.3~u20.04 \
    libigdgmm11=21.3.3+i643~u20.04

... and rolled back to this version of the docker image: plexinc/pms-docker:1.28.0.5999-97678ded3

Hardware is a NUC 10 with i3-10110U.

rbranson avatar Sep 08 '22 14:09 rbranson

The plex issue seems to be solved with 1.29.0.6209-9fa696df6 and the latest intel drivers.

jalaziz avatar Sep 15 '22 17:09 jalaziz

The plex issue seems to be solved with 1.29.0.6209-9fa696df6 and the latest intel drivers.

Are you sure? I haven't tested myself but seen report from people saying it's still not working.

rnsc avatar Sep 15 '22 17:09 rnsc

The plex issue seems to be solved with 1.29.0.6209-9fa696df6 and the latest intel drivers.

Are you sure? I haven't tested myself but seen report from people saying it's still not working.

I thought it wasn't at first, but after upgrading everything and restarting HW HDR->SDR started working again.

jalaziz avatar Sep 15 '22 17:09 jalaziz

I tested the latest image version 4k to 4k transcode buffers with HDR tonemapping on, but works well with HDR tonemapping off. in both cases dashboard shows HW transcoding. J5005 cpu with integrated gpu. cpu usage is the same with HDR tm on/off (about 30%)

f3rr avatar Oct 11 '22 21:10 f3rr

I installed new version of Plex (plexmediaserver_1.29.2.6364-6d72b0cf6_amd64.deb) to confirm if hw accelerated tone mapping has been fixed there. Indeed it works correctly now with hardware acceleration (at least for experiments I did).

I analyzed deeper how Plex Transcoder loads libraries. I can see that it first loads iHD_drv_video.so with dlopen() and iHD_drv_video.so is linked against libigdgmm.so.plex:


intel@intel-NUC7i3BNK:~/plex$ ldd /usr/lib/plexmediaserver/lib/dri/iHD_drv_video.so
        linux-vdso.so.1 (0x00007fff797d2000)
        libgcompat.so.0 => /usr/lib/plexmediaserver/lib/dri/../libgcompat.so.0 (0x00007f151d528000)
        libigdgmm.so.plex => /usr/lib/plexmediaserver/lib/dri/../libigdgmm.so.plex (0x00007f151d46e000)
        libc.so => /usr/lib/plexmediaserver/lib/dri/../libc.so (0x00007f151d3cb000)

I tried to intercept dlopen() calls to check what flags are used. My assumption was that if libigdgmm.so.plex was loaded with RTLD_LOCAL flag it should not collide with symbols from libigdgmm loaded by compute runtime. I wrote simple shared library with own definition of dlopen() to load with LD_PRELOAD. It allowed me to print flags and then call original dlopen(). This approach correctly captured iHD_drv_video.so and even other libs, but unfortunately is was not able to intercept libigdgmm.so.plex, probably because it's not loaded with dlopen but was specified at compilation time. I can see that iHD_drv_video.so is loaded with following flags:

RTLD_NOW
RTLD_GLOBAL
RTLD_NODELETE

I tried to experiment with overwriting RTLD_GLOBAL to RTLD_LOCAL, but unfortunately it resulted in fallback to software path for tone mapping, so it looks like a dead end.

Nevertheless new Plex version I used (plexmediaserver_1.29.2.6364-6d72b0cf6_amd64.deb) indeed used hardware accelerated transcoding also when newer GPU driver was installed in the system, but it quickly turned out that this is because Plex uses own GPU UMD driver copy what can be seen in strace log:

stat("/var/lib/plexmediaserver/Library/Application Support/Plex Media Server/Cache/CL-ICDs/icr.icd", {st_mode=S_IFREG|0644, st_size=111, ...}) = 0
open("/var/lib/plexmediaserver/Library/Application Support/Plex Media Server/Cache/CL-ICDs/icr.icd", O_RDONLY|O_LARGEFILE) = 7
lseek(7, 0, SEEK_END)                   = 111
lseek(7, 0, SEEK_CUR)                   = 111
lseek(7, 0, SEEK_SET)                   = 0
readv(7, [{iov_base="/var/lib/plexmediaserver/Library"..., iov_len=110}, {iov_base="\n", iov_len=1024}], 2) = 111
open("/var/lib/plexmediaserver/Library/Application Support/Plex Media Server/Drivers/icr-9-linux-x86_64/libigdrcl.so", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 8

Further analysis showed that this compute runtime library is most probably built by Plex as it contains many 'plex' strings and even modified reference to libigdgmm:

intel@intel-NUC7i3BNK:/usr/lib/plexmediaserver/lib$ strings  /var/lib/plexmediaserver/Library/Application\ Support/Plex\ Media\ Server/Drivers/icr-9-linux-x86_64/libigdrcl.so | grep libigdgmm
libigdgmm.so.plex

For original compute runtime in the system it looks a bit different:

intel@intel-NUC7i3BNK:/usr/lib/plexmediaserver/lib$ cat /etc/OpenCL/vendors/intel.icd
/usr/lib/x86_64-linux-gnu/intel-opencl/libigdrcl.so
intel@intel-NUC7i3BNK:/usr/lib/plexmediaserver/lib$ strings /usr/lib/x86_64-linux-gnu/intel-opencl/libigdrcl.so | grep libigdgmm
libigdgmm.so.12

The runtime library from plex also contains many strings with paths and it even contains used Neo version:

/data/jenkins/conan_build/1112411750/conan/.conan/data/intel-compute-runtime/22.16.22992-2/plex/stable/build/96953c44c3aa1e3d81776274f854d01e9cbd0473/compute-runtime-22.16.22992/shared/source/os_interface/device_factory.cpp
/data/jenkins/conan_build/1112411750/conan/.conan/data/intel-compute-runtime/22.16.22992-2/plex/stable/build/96953c44c3aa1e3d81776274f854d01e9cbd0473/compute-runtime-22.16.22992/shared/source/os_interface/metrics_library.cpp
/data/jenkins/conan_build/1112411750/conan/.conan/data/intel-compute-runtime/22.16.22992-2/plex/stable/build/96953c44c3aa1e3d81776274f854d01e9cbd0473/compute-runtime-22.16.22992/opencl/source/program/process_device_binary.cpp
/data/jenkins/conan_build/1112411750/conan/.conan/data/intel-compute-runtime/22.16.22992-2/plex/stable/build/96953c44c3aa1e3d81776274f854d01e9cbd0473/compute-runtime-22.16.22992/shared/source/os_interface/linux/os_context_linux.cpp
/data/jenkins/conan_build/1112411750/conan/.conan/data/intel-compute-runtime/22.16.22992-2/plex/stable/build/96953c44c3aa1e3d81776274f854d01e9cbd0473/compute-runtime-22.16.22992/shared/source/page_fault_manager/linux/cpu_page_fault_manager_linux.cpp

So the conclusion is that new version of Plex uses own compiled compute runtime version 22.16.22992-2 thus it is independent from Neo installed in the system. Because it is now shipped as part of Plex package and as experiments showed hardware accelerated tone mapping works for fine this Plex version we may in my opinion close this issue.

pwilma avatar Nov 24 '22 13:11 pwilma

Plex 1.29.x in docker worked initially. Later versions, including 1.29.2.x stopped working for me. I now have 1.30.0.6486 in docker on Ubuntu server, and HDR tone mapping is still only software.

gobigdave avatar Jan 06 '23 01:01 gobigdave

I checked once again with most recent plex version which is now plexmediaserver_1.31.2.6739-a87e876bd_amd64.deb. I still see hardware accelerated tone mapping working corrctly and Plex still uses self compiled compute runtime located in: /var/lib/plexmediaserver/Library/Application Support/Plex Media Server/Drivers/icr-9-linux-x86_64 Based on strings from libigdrcl.so library, compute runtime version used there is 22.16.22992-2. I did a quick experiment and renamed libigdrcl.so from that directory and indeed hardware acceleration was not used in that case what confirms that Plex uses it's own version of Intel compute runtime for hardware acceleration. 

Based on that I'm closing this issue as we cannot guarantee the quality, if application vendor ships his own (potentially modified) version of compute runtime. I suggest to contact Plex support directly for issues like this one.

pwilma avatar Mar 03 '23 11:03 pwilma