llvm-project
llvm-project copied to clipboard
[Issue]: Fail to compile OpenCL kernels on darktable under Fedora 40 rocm 6.0.2
Problem Description
Since updating from Fedora 39 to Fedora 40, darktable-cltest rocm fails to compile. It worked prior in Fedora 39 KDE. Attached is the error from darktable. It seems to be related to: https://github.com/ROCm/llvm-project/pull/70
Output from darktable-cltest -d verbose
0.4118 [opencl_build_program] could not build program: CL_BUILD_PROGRAM_FAILURE
0.4118 [opencl_build_program] BUILD STATUS: -2
0.4119 BUILD LOG:
0.4119 fatal error: malformed or corrupted AST file: 'could not find file '/usr/lib64/llvm17/bin/../../../lib/clang/17/include/opencl-c-base.h' referenced by AST file '/tmp/comgr-c6b354/include/opencl1.2-c.pch''
1 error generated.
Error: Failed to compile source (from CL or HIP source to LLVM IR).
- darktable version 4.7.0~git1208.6bf26b3c
I checked and opencl-c-base.h
is in /usr/lib/clang/17/include/
and also in /usr/lib/clang/18/include/
Operating System
Fedora 40 KDE X11
CPU
AMD Ryzen 7 5700G with Radeon Graphics
GPU
AMD Radeon VII
ROCm Version
ROCm 6.0.0
ROCm Component
ROCm-CompilerSupport
Steps to Reproduce
See above.
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
ROCk module is loaded
=====================
HSA System Attributes
=====================
Runtime Version: 1.1
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE
Mwaitx: DISABLED
DMAbuf Support: YES
==========
HSA Agents
==========
*******
Agent 1
*******
Name: AMD Ryzen 7 5700G with Radeon Graphics
Uuid: CPU-XX
Marketing Name: AMD Ryzen 7 5700G with Radeon Graphics
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 3800
BDFID: 0
Internal Node ID: 0
Compute Unit: 16
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 15679652(0xef40a4) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 15679652(0xef40a4) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 15679652(0xef40a4) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:
*******
Agent 2
*******
Name: gfx90c
Uuid: GPU-XX
Marketing Name: AMD Radeon Graphics
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
L2: 1024(0x400) KB
Chip ID: 5688(0x1638)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2000
BDFID: 3328
Internal Node ID: 1
Compute Unit: 8
SIMDs per CU: 4
Shader Engines: 1
Shader Arrs. per Eng.: 1
WatchPts on Addr. Ranges:4
Coherent Host Access: FALSE
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 64(0x40)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 40(0x28)
Max Work-item Per CU: 2560(0xa00)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 471
SDMA engine uCode:: 40
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 524288(0x80000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
Size: 524288(0x80000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx90c:xnack-
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*** Done ***
Also the info from rocm-clinfo. I do have an Nvidia GPU and that one compiles without issues (rpmfusion drivers)
Number of platforms: 4
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 3.0 CUDA 12.4.131
Platform Name: NVIDIA CUDA
Platform Vendor: NVIDIA Corporation
Platform Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_device_uuid cl_khr_pci_bus_info cl_khr_external_semaphore cl_khr_external_memory cl_khr_external_semaphore_opaque_fd cl_khr_external_memory_opaque_fd
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 2.1 AMD-APP (3602.0)
Platform Name: AMD Accelerated Parallel Processing
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd cl_amd_event_callback
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 1.1 Mesa 24.0.7
Platform Name: Clover
Platform Vendor: Mesa
Platform Extensions: cl_khr_icd
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 3.0
Platform Name: rusticl
Platform Vendor: Mesa/X.org
Platform Extensions: cl_khr_byte_addressable_store cl_khr_create_command_queue cl_khr_expect_assume cl_khr_extended_versioning cl_khr_icd cl_khr_il_program cl_khr_spirv_no_integer_wrap_decoration
Platform Name: NVIDIA CUDA
Number of devices: 1
Device Type: CL_DEVICE_TYPE_GPU
Vendor ID: 10deh
Max compute units: 28
Max work items dimensions: 3
Max work items[0]: 1024
Max work items[1]: 1024
Max work items[2]: 64
Max work group size: 1024
Preferred vector width char: 1
Preferred vector width short: 1
Preferred vector width int: 1
Preferred vector width long: 1
Preferred vector width float: 1
Preferred vector width double: 1
Native vector width char: 1
Native vector width short: 1
Native vector width int: 1
Native vector width long: 1
Native vector width float: 1
Native vector width double: 1
Max clock frequency: 1777Mhz
Address bits: 64
Max memory allocation: 3154821120
Image support: Yes
Max number of images read arguments: 256
Max number of images write arguments: 32
Max image 2D width: 32768
Max image 2D height: 32768
Max image 3D width: 16384
Max image 3D height: 16384
Max image 3D depth: 16384
Max samplers within kernel: 32
Max size of kernel argument: 32764
Alignment (bits) of base address: 4096
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: Yes
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: Read/Write
Cache line size: 128
Cache size: 802816
Global memory size: 12619284480
Constant buffer size: 65536
Max number of constant args: 9
Local memory type: Scratchpad
Local memory size: 49152
Max pipe arguments: 0
Max pipe active reservations: 0
Max pipe packet size: 0
Max global variable size: 0
Max global variable preferred total size: 0
Max read/write image args: 0
Max on device events: 0
Queue on device max size: 0
Max on device queues: 0
Queue on device preferred size: 0
SVM capabilities:
Coarse grain buffer: Yes
Fine grain buffer: No
Fine grain system: No
Atomics: No
Preferred platform atomic alignment: 0
Preferred global atomic alignment: 0
Preferred local atomic alignment: 0
Kernel Preferred work group size multiple: 32
Error correction support: 0
Unified memory for Host and Device: 0
Profiling timer resolution: 1000
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue on Host properties:
Out-of-Order: Yes
Profiling : Yes
Queue on Device properties:
Out-of-Order: No
Profiling : No
Platform ID: 0x564f9f0c6e00
Name: NVIDIA GeForce RTX 3060
Vendor: NVIDIA Corporation
Device OpenCL C version: OpenCL C 1.2
Driver version: 550.78
Profile: FULL_PROFILE
Version: OpenCL 3.0 CUDA
Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_device_uuid cl_khr_pci_bus_info cl_khr_external_semaphore cl_khr_external_memory cl_khr_external_semaphore_opaque_fd cl_khr_external_memory_opaque_fd
Platform Name: AMD Accelerated Parallel Processing
Number of devices: 1
Device Type: CL_DEVICE_TYPE_GPU
Vendor ID: 1002h
Board name: AMD Radeon Graphics
Device Topology: PCI[ B#13, D#0, F#0 ]
Max compute units: 8
Max work items dimensions: 3
Max work items[0]: 1024
Max work items[1]: 1024
Max work items[2]: 1024
Max work group size: 256
Preferred vector width char: 4
Preferred vector width short: 2
Preferred vector width int: 1
Preferred vector width long: 1
Preferred vector width float: 1
Preferred vector width double: 1
Native vector width char: 4
Native vector width short: 2
Native vector width int: 1
Native vector width long: 1
Native vector width float: 1
Native vector width double: 1
Max clock frequency: 2000Mhz
Address bits: 64
Max memory allocation: 402653184
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 8
Max image 2D width: 16384
Max image 2D height: 16384
Max image 3D width: 16384
Max image 3D height: 16384
Max image 3D depth: 8192
Max samplers within kernel: 16
Max size of kernel argument: 1024
Alignment (bits) of base address: 1024
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: Yes
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: Read/Write
Cache line size: 64
Cache size: 16384
Global memory size: 536870912
Constant buffer size: 402653184
Max number of constant args: 8
Local memory type: Scratchpad
Local memory size: 65536
Max pipe arguments: 16
Max pipe active reservations: 16
Max pipe packet size: 402653184
Max global variable size: 402653184
Max global variable preferred total size: 536870912
Max read/write image args: 64
Max on device events: 1024
Queue on device max size: 8388608
Max on device queues: 1
Queue on device preferred size: 262144
SVM capabilities:
Coarse grain buffer: Yes
Fine grain buffer: Yes
Fine grain system: No
Atomics: No
Preferred platform atomic alignment: 0
Preferred global atomic alignment: 0
Preferred local atomic alignment: 0
Kernel Preferred work group size multiple: 64
Error correction support: 0
Unified memory for Host and Device: 0
Profiling timer resolution: 1
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue on Host properties:
Out-of-Order: No
Profiling : Yes
Queue on Device properties:
Out-of-Order: Yes
Profiling : Yes
Platform ID: 0x7f12861f1808
Name: gfx90c:xnack-
Vendor: Advanced Micro Devices, Inc.
Device OpenCL C version: OpenCL C 2.0
Driver version: 3602.0 (HSA1.1,LC)
Profile: FULL_PROFILE
Version: OpenCL 2.0
Extensions: cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_depth_images cl_amd_copy_buffer_p2p cl_amd_assembly_program
Platform Name: Clover
Number of devices: 1
Device Type: CL_DEVICE_TYPE_GPU
Vendor ID: 1002h
Max compute units: 8
Max work items dimensions: 3
Max work items[0]: 256
Max work items[1]: 256
Max work items[2]: 256
Max work group size: 256
Preferred vector width char: 16
Preferred vector width short: 8
Preferred vector width int: 4
Preferred vector width long: 2
Preferred vector width float: 4
Preferred vector width double: 2
Native vector width char: 16
Native vector width short: 8
Native vector width int: 4
Native vector width long: 2
Native vector width float: 4
Native vector width double: 2
Max clock frequency: 2000Mhz
Address bits: 64
Max memory allocation: 2006994944
Image support: No
Max size of kernel argument: 1024
Alignment (bits) of base address: 32768
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: No
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: No
Round to +ve and infinity: No
IEEE754-2008 fused multiply-add: No
Cache type: None
Cache line size: 0
Cache size: 0
Global memory size: 8027979776
Constant buffer size: 67108864
Max number of constant args: 16
Local memory type: Scratchpad
Local memory size: 65536
ERROR: clBuildProgram(-11)
Additional Information
If this is not the correct place to raise this issue, please let me know.