darktable icon indicating copy to clipboard operation
darktable copied to clipboard

Rocm on AMD iGPU: Darktable 4.6.0 does not terminate when window is closed (with opencl enabled)

Open badshah400 opened this issue 1 year ago • 22 comments

Describe the bug

Launch darktable application and import images (or process pre-imported image in darkroom). Then close darktable window. Window closes. Try launching application again (after a few minutes or so), darktable complains database is locked by previous process. At this points, one may open any process monitoring app or run pidof darktable to see that indeed previous darktable process is indeed not terminated. It finally needs to be killed.

Alternatively, launch darktable from terminal, close window after importing images and note in the terminal that process has not terminated.

In either case, DT terminates gracefully when launched with opencl disabled (e.g. darktable --disable-opencl from the terminal, import images, close window --> process terminated just fine).

Steps to reproduce

  1. Launch darktable from terminal
  2. Import some images to library.
  3. Close darktable application window.
  4. Note in the terminal that process does not terminate.
  5. Wait, hit Ctrl-C eventually to kill process.

Expected behavior

Darktable should terminate when its window is closed.

Logfile | Screenshot | Screencast

I ran darktable -d opencl because the problem seems related to opencl being enabled (by default on my system). I attach the log here: https://paste.opensuse.org/pastes/4101b25588e9

Here is a screencast to show the issue in case this helps. Screencast from 2023-12-22 22-14-25.webm

Commit

No response

Where did you obtain darktable from?

OBS

darktable version

4.6.0

What OS are you using?

Linux

What is the version of your OS?

openSUSE Tumbleweed

Describe your system?

  • Linux x86_64
  • GNOME 45.2
  • Wayland
  • AMD GPU

Are you using OpenCL GPU in darktable?

Yes

If yes, what is the GPU card and driver?

AMD Radeon Graphics, 4 GiB Memory, Driver Version [as reported by clinfo] 3602.0 (HSA1.1,LC)

Please provide additional context if applicable. You can attach files too, but might need to rename to .txt or .zip

  • Can you reproduce with another darktable version(s)?
    • Darktable 4.4.2 worked just fine, even with opencl enabled, on the same machine. Same drivers, etc.
  • Is the issue still present using an empty/new config-dir (e.g. start darktable with --configdir "/tmp")?
    • Yes, I emptied out my cache and config dirs before launching the new version of DT as shown in attached screencast.

badshah400 avatar Dec 22 '23 17:12 badshah400

Your log is very shot, only shows first part while initializing opencl. Could you run with -d opencl -d pipe please?

jenshannoschwalm avatar Dec 22 '23 17:12 jenshannoschwalm

Here is the additional log when -d opencl -d pipe is used (i.e. same as above plus the following):

    19.9840 dt_dev_pixelpipe_synch_all [full]                                  defaults 0.0719s, history 0.0195s
    19.9841 pixelpipe_cache_checkmem   [full]                                  64 lines (important=0, used=0). Freed 0MB. Using using 0MB, limit=436MB
    19.9841 pixelpipe starting CL      [full]                                  (   0/   0) 2079x1386 scale=0.3799 --> (   0/   0) 2079x1386 scale=0.3799 device=0 (amdacceleratedparallelprocessinggfx90cxnack)
    19.9841 [dt_opencl_check_tuning] use 2526MB (headroom=OFF, pinning=OFF) on device `AMD Accelerated Parallel Processing gfx90c:xnack-' id=0
    19.9842 pixelpipe data: clip&zoom  [full]                                  (   0/   0) 5472x3648 scale=1.0000 --> (   0/   0) 2079x1386 scale=0.3799 
    20.2892 pixelpipe process CL       [full]           colorin                (   0/   0) 2079x1386 scale=0.3799 --> (   0/   0) 2079x1386 scale=0.3799 IOP_CS_RGB -> IOP_CS_LAB

I close the window right after and the process just hangs there in the terminal until I hit Ctrl-C.

badshah400 avatar Dec 22 '23 17:12 badshah400

This also happens if you just sit&wait for a minute or so? No more info in the log? If so please retry with -d opencl -d pipe -d verbose.

jenshannoschwalm avatar Dec 22 '23 18:12 jenshannoschwalm

This also happens if you just sit&wait for a minute or so? No more info in the log? If so please retry with -d opencl -d pipe -d verbose.

Yes, it just freezes there (waited for 2-3 mins). So does darktable while opening the image actually. Here is the full log from -d opencl -d pipe -d verbose: https://paste.opensuse.org/pastes/95ae9eea4696

badshah400 avatar Dec 22 '23 18:12 badshah400

Do you know what GPU you have? Is this an iGPU?

And what is your current linux kernel?

gi-man avatar Dec 22 '23 18:12 gi-man

Could you also a) remove the opencl compiler option - the line with cl-fast-relaxed-math in your darktable config file and b) remove the cl kernels again. c) your opencl compiler settings in preferences. Only the amd driver active?

jenshannoschwalm avatar Dec 22 '23 18:12 jenshannoschwalm

Do you know what GPU you have? Is this an iGPU?

Yes, this is an AMD Radeon integrated GPU: "AMD Ryzen 5 PRO 5650U with Radeon Graphics"

Could you also a) remove the opencl compiler option - the line with cl-fast-relaxed-math in your darktable config file and b) remove the cl kernels again. c) your opencl compiler settings in preferences. Only the amd driver active?

a) Same result, freezes b) Removed the .cache/darktable/*.bin (assuming that is what you meant here). Start dt, same result. c) By default it enables three: Intel GPU, Nvidia Cuda, AMD Rocm. I de-selected the three now to test, but that does not help either.

Screenshot from 2023-12-23 00-20-23

badshah400 avatar Dec 22 '23 18:12 badshah400

Just to make sure - you have disabled all three driver options now? After restarting i guess they are still visible and active?

jenshannoschwalm avatar Dec 22 '23 18:12 jenshannoschwalm

And what is your current linux kernel?

Sorry, missed this: Linux kernel 6.6.6 (devil's number, I know!)

Just to make sure - you have disabled all three driver options now? After restarting i guess they are still visible and active?

No, I only disabled the Intel and Nvidia options. I left the AMD Rocm option enabled.

After restarting DT, just the AMD Rocm is enabled as I would expect.

badshah400 avatar Dec 22 '23 18:12 badshah400

If I disable all three, on next restart DT reports "No openCL device found" and reverts to CPU processing, which as reported originally, works fine.

badshah400 avatar Dec 22 '23 19:12 badshah400

Removed the .cache/darktable/*.bin (assuming that is what you meant here). Start dt, same result.

Nope. The whole .cache/darktable/kernel ... directory.

Sorry, missed this: Linux kernel 6.6.6 (devil's number, I know!)

Oh - there have been some pretty bad issues around related to opencl on amd and 6.6 kernel ...

jenshannoschwalm avatar Dec 22 '23 19:12 jenshannoschwalm

I think iGPU are not officially supported by rocm. They might work.

There is also a report that linux kernel 6.6.x /rocm is causing issues. https://www.mail-archive.com/[email protected]/msg06837.html

gi-man avatar Dec 22 '23 19:12 gi-man

OK, thanks a lot for digging up that report for me. I guess it might be the issue with kernel 6.6.x reported there for me too, although thankfully I can at least kill -9 the frozen DT process. Will disable opencl and work for now.

Immensely grateful for almost live-debugging this with me. Many thanks @gi-man, @jenshannoschwalm .

badshah400 avatar Dec 22 '23 19:12 badshah400

Ok, maybe you change the issue title to something mentioning AMD and kernel 6.6 for other users to "stumble across"?

jenshannoschwalm avatar Dec 22 '23 19:12 jenshannoschwalm

Ok, maybe you change the issue title to something mentioning AMD and kernel 6.6 for other users to "stumble across"?

I also find the same issue with kernel 6.4 (just tested now), so I updated the title to say Rocm on AMD iGPU in general. Hope that is ok.

badshah400 avatar Dec 22 '23 19:12 badshah400

There is one interesting issue in your log. It says you have dedicated graphics memory. Will check if that test is wrong in dt code...

jenshannoschwalm avatar Dec 22 '23 19:12 jenshannoschwalm

I can confirm. ROCm with kernel 6.6 makes DT freeze on exit. Moreover, in my case (GFX803) OpenCL doesn't work at all, falling back to CPU via timeout on every operation. On kernel 6.5 OpenCL works fine.

kanyck avatar Dec 23 '23 12:12 kanyck

Can you please check in dmesg if you get the following errors? https://gitlab.freedesktop.org/drm/amd/-/issues/3037#note_2213169

Germano0 avatar Dec 23 '23 15:12 Germano0

Can you please check in dmesg if you get the following errors? https://gitlab.freedesktop.org/drm/amd/-/issues/3037#note_2213169

I do not see any similar messages, but mine is not a discrete GPU. I also have this issue with DT when booting with kernel 6.4.x, as I have noted previously, so I daresay yours is a separate issue altogether.

badshah400 avatar Dec 23 '23 17:12 badshah400

Switched back to DT 4.4.2 and OpenCL processing works great, even on kernel 6.6.6, but still does not terminate unless killed. Here is the output of darktable-cltest. Note DEVICE_TYPE: GPU, dedicated mem in 4.6.0 vs DEVICE_TYPE: GPU in 4.4.2. Btw, in 4.4.2 the processing speed when OpenCL is enabled is super fast, compared to CPU-only speeds, so — notwithstanding that it is only an integrated chip — the GPU does make a difference in my case,

@jenshannoschwalm So, https://github.com/darktable-org/darktable/issues/15931#issuecomment-1868020730 is probably it.

     0.0308 [dt_get_sysresource_level] switched to 1 as `default'
     0.0308   total mem:       27884MB
     0.0308   mipmap cache:    3485MB
     0.0308   available mem:   13942MB
     0.0308   singlebuff:      217MB
     0.0308   OpenCL tune mem: OFF
     0.0308   OpenCL pinned:   OFF
[opencl_init] opencl related configuration options:
[opencl_init] opencl: ON
[opencl_init] opencl_scheduling_profile: 'default'
[opencl_init] opencl_library: 'default path'
[opencl_init] opencl_device_priority: '*/!0,*/*/*/!0,*'
[opencl_init] opencl_mandatory_timeout: 400
[opencl_init] opencl library 'libOpenCL' found on your system and loaded
[opencl_init] found 1 platform
[opencl_init] found 1 device

[dt_opencl_device_init]
   DEVICE:                   0: 'gfx90c:xnack-'
   PLATFORM NAME & VENDOR:   AMD Accelerated Parallel Processing, Advanced Micro Devices, Inc.
   CANONICAL NAME:           amdacceleratedparallelprocessinggfx90cxnack
   DRIVER VERSION:           3602.0 (HSA1.1,LC)
   DEVICE VERSION:           OpenCL 2.0 
   DEVICE_TYPE:              GPU
   GLOBAL MEM SIZE:          4096 MB
   MAX MEM ALLOC:            3482 MB
   MAX IMAGE SIZE:           16384 x 16384
   MAX WORK GROUP SIZE:      256
   MAX WORK ITEM DIMENSIONS: 3
   MAX WORK ITEM SIZES:      [ 1024 1024 1024 ]
   ASYNC PIXELPIPE:          NO
   PINNED MEMORY TRANSFER:   NO
   MEMORY TUNING:            NO
   FORCED HEADROOM:          400
   AVOID ATOMICS:            NO
   MICRO NAP:                250
   ROUNDUP WIDTH:            16
   ROUNDUP HEIGHT:           16
   CHECK EVENT HANDLES:      128
   PERFORMANCE:              1.937
   TILING ADVANTAGE:         0.000
   DEFAULT DEVICE:           NO
   KERNEL BUILD DIRECTORY:   /usr/share/darktable/kernels
   KERNEL DIRECTORY:         /home/badshah/.cache/darktable/cached_v1_kernels_for_AMDAcceleratedParallelProcessinggfx90cxnack_36020HSA11LC
   CL COMPILER OPTION:       -cl-fast-relaxed-math
   KERNEL LOADING TIME:       0.0306 sec
[opencl_init] OpenCL successfully initialized. Internal numbers and names of available devices:
[opencl_init]		0	'AMD Accelerated Parallel Processing gfx90c:xnack-'
[opencl_init] FINALLY: opencl is AVAILABLE and ENABLED.
[dt_opencl_update_priorities] these are your device priorities:
[dt_opencl_update_priorities] 		image	preview	export	thumbs	preview2
[dt_opencl_update_priorities]		0	-1	0	0	-1
[dt_opencl_update_priorities] show if opencl use is mandatory for a given pixelpipe:
[dt_opencl_update_priorities] 		image	preview	export	thumbs	preview2
[dt_opencl_update_priorities]		0	0	0	0	0
[opencl_synchronization_timeout] synchronization timeout set to 200
[dt_opencl_update_priorities] these are your device priorities:
[dt_opencl_update_priorities] 		image	preview	export	thumbs	preview2
[dt_opencl_update_priorities]		0	-1	0	0	-1
[dt_opencl_update_priorities] show if opencl use is mandatory for a given pixelpipe:
[dt_opencl_update_priorities] 		image	preview	export	thumbs	preview2
[dt_opencl_update_priorities]		0	0	0	0	0
[opencl_synchronization_timeout] synchronization timeout set to 200

EDIT: GPU with dedicated memory reported by 4.6.0 and just "GPU" reported with version 4.4.2, not the other way around.

badshah400 avatar Dec 24 '23 09:12 badshah400

I do not see any similar messages, but mine is not a discrete GPU.

It's not about discrete GPU or not discrete, indeed my message in such URL was about an integrated AMD GPU

Germano0 avatar Dec 24 '23 13:12 Germano0

This issue has been marked as stale due to inactivity for the last 60 days. It will be automatically closed in 300 days if no update occurs. Please check if the master branch has fixed it and report again or close the issue.

github-actions[bot] avatar Feb 29 '24 00:02 github-actions[bot]

Fwiw, never got rocm opencl to work, but I have OpenCL working on DT with RustiCL and it is doing pretty well.

Mesa 24.0.3 in case anyone is interested.

Please feel free to close this issue and thanks for the awesome app once again.

badshah400 avatar Apr 04 '24 09:04 badshah400