darktable
darktable copied to clipboard
Rocm on AMD iGPU: Darktable 4.6.0 does not terminate when window is closed (with opencl enabled)
Describe the bug
Launch darktable application and import images (or process pre-imported image in darkroom). Then close darktable window. Window closes. Try launching application again (after a few minutes or so), darktable complains database is locked by previous process. At this points, one may open any process monitoring app or run pidof darktable
to see that indeed previous darktable process is indeed not terminated. It finally needs to be killed.
Alternatively, launch darktable from terminal, close window after importing images and note in the terminal that process has not terminated.
In either case, DT terminates gracefully when launched with opencl disabled (e.g. darktable --disable-opencl
from the terminal, import images, close window --> process terminated just fine).
Steps to reproduce
- Launch darktable from terminal
- Import some images to library.
- Close darktable application window.
- Note in the terminal that process does not terminate.
- Wait, hit Ctrl-C eventually to kill process.
Expected behavior
Darktable should terminate when its window is closed.
Logfile | Screenshot | Screencast
I ran darktable -d opencl
because the problem seems related to opencl being enabled (by default on my system). I attach the log here: https://paste.opensuse.org/pastes/4101b25588e9
Here is a screencast to show the issue in case this helps. Screencast from 2023-12-22 22-14-25.webm
Commit
No response
Where did you obtain darktable from?
OBS
darktable version
4.6.0
What OS are you using?
Linux
What is the version of your OS?
openSUSE Tumbleweed
Describe your system?
- Linux x86_64
- GNOME 45.2
- Wayland
- AMD GPU
Are you using OpenCL GPU in darktable?
Yes
If yes, what is the GPU card and driver?
AMD Radeon Graphics, 4 GiB Memory, Driver Version [as reported by clinfo
] 3602.0 (HSA1.1,LC)
Please provide additional context if applicable. You can attach files too, but might need to rename to .txt or .zip
- Can you reproduce with another darktable version(s)?
- Darktable 4.4.2 worked just fine, even with opencl enabled, on the same machine. Same drivers, etc.
- Is the issue still present using an empty/new config-dir (e.g. start darktable with --configdir "/tmp")?
- Yes, I emptied out my cache and config dirs before launching the new version of DT as shown in attached screencast.
Your log is very shot, only shows first part while initializing opencl. Could you run with -d opencl -d pipe
please?
Here is the additional log when -d opencl -d pipe
is used (i.e. same as above plus the following):
19.9840 dt_dev_pixelpipe_synch_all [full] defaults 0.0719s, history 0.0195s
19.9841 pixelpipe_cache_checkmem [full] 64 lines (important=0, used=0). Freed 0MB. Using using 0MB, limit=436MB
19.9841 pixelpipe starting CL [full] ( 0/ 0) 2079x1386 scale=0.3799 --> ( 0/ 0) 2079x1386 scale=0.3799 device=0 (amdacceleratedparallelprocessinggfx90cxnack)
19.9841 [dt_opencl_check_tuning] use 2526MB (headroom=OFF, pinning=OFF) on device `AMD Accelerated Parallel Processing gfx90c:xnack-' id=0
19.9842 pixelpipe data: clip&zoom [full] ( 0/ 0) 5472x3648 scale=1.0000 --> ( 0/ 0) 2079x1386 scale=0.3799
20.2892 pixelpipe process CL [full] colorin ( 0/ 0) 2079x1386 scale=0.3799 --> ( 0/ 0) 2079x1386 scale=0.3799 IOP_CS_RGB -> IOP_CS_LAB
I close the window right after and the process just hangs there in the terminal until I hit Ctrl-C.
This also happens if you just sit&wait for a minute or so? No more info in the log? If so please retry with -d opencl -d pipe -d verbose
.
This also happens if you just sit&wait for a minute or so? No more info in the log? If so please retry with
-d opencl -d pipe -d verbose
.
Yes, it just freezes there (waited for 2-3 mins). So does darktable while opening the image actually. Here is the full log from -d opencl -d pipe -d verbose
: https://paste.opensuse.org/pastes/95ae9eea4696
Do you know what GPU you have? Is this an iGPU?
And what is your current linux kernel?
Could you also a) remove the opencl compiler option - the line with cl-fast-relaxed-math in your darktable config file and b) remove the cl kernels again. c) your opencl compiler settings in preferences. Only the amd driver active?
Do you know what GPU you have? Is this an iGPU?
Yes, this is an AMD Radeon integrated GPU: "AMD Ryzen 5 PRO 5650U with Radeon Graphics"
Could you also a) remove the opencl compiler option - the line with cl-fast-relaxed-math in your darktable config file and b) remove the cl kernels again. c) your opencl compiler settings in preferences. Only the amd driver active?
a) Same result, freezes
b) Removed the .cache/darktable/*.bin
(assuming that is what you meant here). Start dt, same result.
c) By default it enables three: Intel GPU, Nvidia Cuda, AMD Rocm. I de-selected the three now to test, but that does not help either.
Just to make sure - you have disabled all three driver options now? After restarting i guess they are still visible and active?
And what is your current linux kernel?
Sorry, missed this: Linux kernel 6.6.6 (devil's number, I know!)
Just to make sure - you have disabled all three driver options now? After restarting i guess they are still visible and active?
No, I only disabled the Intel and Nvidia options. I left the AMD Rocm option enabled.
After restarting DT, just the AMD Rocm is enabled as I would expect.
If I disable all three, on next restart DT reports "No openCL device found" and reverts to CPU processing, which as reported originally, works fine.
Removed the
.cache/darktable/*.bin
(assuming that is what you meant here). Start dt, same result.
Nope. The whole .cache/darktable/kernel ... directory.
Sorry, missed this: Linux kernel 6.6.6 (devil's number, I know!)
Oh - there have been some pretty bad issues around related to opencl on amd and 6.6 kernel ...
I think iGPU are not officially supported by rocm. They might work.
There is also a report that linux kernel 6.6.x /rocm is causing issues. https://www.mail-archive.com/[email protected]/msg06837.html
OK, thanks a lot for digging up that report for me. I guess it might be the issue with kernel 6.6.x reported there for me too, although thankfully I can at least kill -9
the frozen DT process. Will disable opencl and work for now.
Immensely grateful for almost live-debugging this with me. Many thanks @gi-man, @jenshannoschwalm .
Ok, maybe you change the issue title to something mentioning AMD and kernel 6.6 for other users to "stumble across"?
Ok, maybe you change the issue title to something mentioning AMD and kernel 6.6 for other users to "stumble across"?
I also find the same issue with kernel 6.4 (just tested now), so I updated the title to say Rocm on AMD iGPU in general. Hope that is ok.
There is one interesting issue in your log. It says you have dedicated graphics memory. Will check if that test is wrong in dt code...
I can confirm. ROCm with kernel 6.6 makes DT freeze on exit. Moreover, in my case (GFX803) OpenCL doesn't work at all, falling back to CPU via timeout on every operation. On kernel 6.5 OpenCL works fine.
Can you please check in dmesg
if you get the following errors?
https://gitlab.freedesktop.org/drm/amd/-/issues/3037#note_2213169
Can you please check in
dmesg
if you get the following errors? https://gitlab.freedesktop.org/drm/amd/-/issues/3037#note_2213169
I do not see any similar messages, but mine is not a discrete GPU. I also have this issue with DT when booting with kernel 6.4.x, as I have noted previously, so I daresay yours is a separate issue altogether.
Switched back to DT 4.4.2 and OpenCL processing works great, even on kernel 6.6.6, but still does not terminate unless killed. Here is the output of darktable-cltest
. Note DEVICE_TYPE: GPU, dedicated mem
in 4.6.0 vs DEVICE_TYPE: GPU
in 4.4.2. Btw, in 4.4.2 the processing speed when OpenCL is enabled is super fast, compared to CPU-only speeds, so — notwithstanding that it is only an integrated chip — the GPU does make a difference in my case,
@jenshannoschwalm So, https://github.com/darktable-org/darktable/issues/15931#issuecomment-1868020730 is probably it.
0.0308 [dt_get_sysresource_level] switched to 1 as `default'
0.0308 total mem: 27884MB
0.0308 mipmap cache: 3485MB
0.0308 available mem: 13942MB
0.0308 singlebuff: 217MB
0.0308 OpenCL tune mem: OFF
0.0308 OpenCL pinned: OFF
[opencl_init] opencl related configuration options:
[opencl_init] opencl: ON
[opencl_init] opencl_scheduling_profile: 'default'
[opencl_init] opencl_library: 'default path'
[opencl_init] opencl_device_priority: '*/!0,*/*/*/!0,*'
[opencl_init] opencl_mandatory_timeout: 400
[opencl_init] opencl library 'libOpenCL' found on your system and loaded
[opencl_init] found 1 platform
[opencl_init] found 1 device
[dt_opencl_device_init]
DEVICE: 0: 'gfx90c:xnack-'
PLATFORM NAME & VENDOR: AMD Accelerated Parallel Processing, Advanced Micro Devices, Inc.
CANONICAL NAME: amdacceleratedparallelprocessinggfx90cxnack
DRIVER VERSION: 3602.0 (HSA1.1,LC)
DEVICE VERSION: OpenCL 2.0
DEVICE_TYPE: GPU
GLOBAL MEM SIZE: 4096 MB
MAX MEM ALLOC: 3482 MB
MAX IMAGE SIZE: 16384 x 16384
MAX WORK GROUP SIZE: 256
MAX WORK ITEM DIMENSIONS: 3
MAX WORK ITEM SIZES: [ 1024 1024 1024 ]
ASYNC PIXELPIPE: NO
PINNED MEMORY TRANSFER: NO
MEMORY TUNING: NO
FORCED HEADROOM: 400
AVOID ATOMICS: NO
MICRO NAP: 250
ROUNDUP WIDTH: 16
ROUNDUP HEIGHT: 16
CHECK EVENT HANDLES: 128
PERFORMANCE: 1.937
TILING ADVANTAGE: 0.000
DEFAULT DEVICE: NO
KERNEL BUILD DIRECTORY: /usr/share/darktable/kernels
KERNEL DIRECTORY: /home/badshah/.cache/darktable/cached_v1_kernels_for_AMDAcceleratedParallelProcessinggfx90cxnack_36020HSA11LC
CL COMPILER OPTION: -cl-fast-relaxed-math
KERNEL LOADING TIME: 0.0306 sec
[opencl_init] OpenCL successfully initialized. Internal numbers and names of available devices:
[opencl_init] 0 'AMD Accelerated Parallel Processing gfx90c:xnack-'
[opencl_init] FINALLY: opencl is AVAILABLE and ENABLED.
[dt_opencl_update_priorities] these are your device priorities:
[dt_opencl_update_priorities] image preview export thumbs preview2
[dt_opencl_update_priorities] 0 -1 0 0 -1
[dt_opencl_update_priorities] show if opencl use is mandatory for a given pixelpipe:
[dt_opencl_update_priorities] image preview export thumbs preview2
[dt_opencl_update_priorities] 0 0 0 0 0
[opencl_synchronization_timeout] synchronization timeout set to 200
[dt_opencl_update_priorities] these are your device priorities:
[dt_opencl_update_priorities] image preview export thumbs preview2
[dt_opencl_update_priorities] 0 -1 0 0 -1
[dt_opencl_update_priorities] show if opencl use is mandatory for a given pixelpipe:
[dt_opencl_update_priorities] image preview export thumbs preview2
[dt_opencl_update_priorities] 0 0 0 0 0
[opencl_synchronization_timeout] synchronization timeout set to 200
EDIT: GPU with dedicated memory reported by 4.6.0 and just "GPU" reported with version 4.4.2, not the other way around.
I do not see any similar messages, but mine is not a discrete GPU.
It's not about discrete GPU or not discrete, indeed my message in such URL was about an integrated AMD GPU
This issue has been marked as stale due to inactivity for the last 60 days. It will be automatically closed in 300 days if no update occurs. Please check if the master branch has fixed it and report again or close the issue.
Fwiw, never got rocm opencl to work, but I have OpenCL working on DT with RustiCL and it is doing pretty well.
Mesa 24.0.3 in case anyone is interested.
Please feel free to close this issue and thanks for the awesome app once again.