[Issue]: 6.2.0 compilation issue
Problem Description
Hello,
I'm trying to compile version 6.2.0 but I receive a error after a while. I configure the project with these parameters:
mkdir build && cd build
cmake
-Wno-dev
-G Ninja
-D CMAKE_HIP_COMPILER_ROCM_LIB=/opt/rocm/lib
-D HIP_LANG=/opt/rocm/lib64
-D CMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc
-D CMAKE_BUILD_TYPE=Release
-D CMAKE_INSTALL_PREFIX=/opt/rocm
-D BUILD_DEV=OFF
-D INSTANCES_ONLY=ON
-D USE_BITINT_EXTENSION_INT4=ON
-D BUILD_TESTING=OFF
..
ninja -j16
The part of error bellow, the full error log is attached.
Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/CMakeFiles/device_grouped_conv2d_fwd_instance.dir/xdl/comp/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_comp instance.cpp.o FAILED: library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/CMakeFiles/device_grouped_conv2d_fwd_instance.dir/xdl/comp/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_comp_instance.cpp.o /opt/rocm/bin/hipcc -DCK_ENABLE_BF16 -DCK_ENABLE_BF8 -DCK_ENABLE_FP16 -DCK_ENABLE_FP32 -DCK_ENABLE_FP64 -DCK_ENABLE_FP8 -DCK_ENABLE_INT8 -DCK_EXPERIMENTAL_BIT_INT_EXTENSION_INT4 -DCK_USE_WMMA -DCK_USE_XDL -DINSTANCES_ONLY -DUSE_PROF_API=1 -D__HIP_PLATFORM_AMD_=1 -D__HIP_PLATFORM_HCC__=1 -I/mnt/arhiv/rocm/release/composable_kernel-rocm-6.2.0/library/include -I/mnt/arhiv/rocm/release/composable_kernel-rocm -6.2.0/include -I/mnt/arhiv/rocm/rocm-build/build/composable_kernel/include -isystem /opt/rocm/include -O3 -DNDEBUG -std=c++17 -fPIC -Wall -Wextra -Wcomment -Wendif-labels -Wformat -Winit-self -Wreturn- type -Wsequence-point -Wswitch -Wtrigraphs -Wundef -Wuninitialized -Wunreachable-code -Wunused -Wno-reserved-identifier -Werror -Wno-option-ignored -Wsign-compare -Wno-extra-semi-stmt -Wno-unused-template -Wno-missing-field-initializers -Wno-deprecated-declarations -Wall -Wextra -Wcomment -Wendif-labels -Wformat -Winit-self -Wreturn-type -Wsequence-point -Wswitch -Wtrigraphs -Wundef -Wuninitialized -Wunre achable-code -Wunused -Wno-reserved-identifier -Werror -Wno-option-ignored -Wsign-compare -Wno-extra-semi-stmt -Wno-unused-template -Weverything -Wno-c++98-compat -Wno-c++98-compat-pedantic -Wno-conversio n -Wno-double-promotion -Wno-exit-time-destructors -Wno-extra-semi -Wno-float-conversion -Wno-gnu-anonymous-struct -Wno-gnu-zero-variadic-macro-arguments -Wno-missing-prototypes -Wno-nested-anon-types -Wn o-padded -Wno-return-std-move-in-c++11 -Wno-shorten-64-to-32 -Wno-sign-conversion -Wno-unknown-warning-option -Wno-unused-command-line-argument -Wno-weak-vtables -Wno-covered-switch-default -Wno-unsafe-bu ffer-usage -Wno-unused-lambda-capture -Wno-bit-int-extension -Wno-pass-failed -Wno-switch-default -fno-offload-uniform-block -mllvm -enable-post-misched=0 -x hip --offload-arch=gfx1100 --offload-arch=gfx9 08 --offload-arch=gfx90a --offload-arch=gfx940 --offload-arch=gfx941 --offload-arch=gfx942 -MD -MT library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/CMakeFiles/device_grouped_conv2d_fwd_instanc e.dir/xdl/comp/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_comp_instance.cpp.o -MF library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/CMakeFiles/device_grouped_conv2d_fwd_instance.dir/xd l/comp/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_comp_instance.cpp.o.d -o library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/CMakeFiles/device_grouped_conv2d_fwd_instance.dir/xdl/comp/ device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_comp_instance.cpp.o -c /mnt/arhiv/rocm/release/composable_kernel-rocm-6.2.0/library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_comp_instance.cpp In file included from /mnt/arhiv/rocm/release/composable_kernel-rocm-6.2.0/library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_comp_in stance.cpp:5: In file included from /mnt/arhiv/rocm/release/composable_kernel-rocm-6.2.0/library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_comp_instance.hpp:6: /mnt/arhiv/rocm/release/composable_kernel-rocm-6.2.0/include/ck/tensor_operation/gpu/device/impl/device_grouped_conv_fwd_multiple_abd_xdl_cshuffle_v3.hpp:73:74: error: unused parameter 'a_grid_desc_ak0_m ak1' [-Werror,-Wunused-parameter] 73 | const AGridDesc_AK0_M_K1 a_grid_desc_ak0_m_ak1, | ^
Operating System
Slackware 15.0 x86_64
CPU
AMD Ryzen 7 3800X 8-Core Processor
GPU
AMD Radeon RX 7900 XT
Other
No response
ROCm Version
ROCm 6.0.0
ROCm Component
Composable Kernel
Steps to Reproduce
No response
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
ROCk module is loaded
HSA System Attributes
Runtime Version: 1.1
Runtime Ext Version: 1.6
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE
Mwaitx: DISABLED
DMAbuf Support: YES
==========
HSA Agents
Agent 1
Name: AMD Ryzen 7 3800X 8-Core Processor
Uuid: CPU-XX
Marketing Name: AMD Ryzen 7 3800X 8-Core Processor
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 4560
BDFID: 0
Internal Node ID: 0
Compute Unit: 16
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Memory Properties:
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 32780980(0x1f432b4) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 32780980(0x1f432b4) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 32780980(0x1f432b4) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:
Agent 2
Name: gfx1100
Uuid: GPU-6e99eccb20090e4e
Marketing Name: AMD Radeon RX 7900 XTX
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 32(0x20) KB
L2: 6144(0x1800) KB
L3: 98304(0x18000) KB
Chip ID: 29772(0x744c)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2482
BDFID: 3072
Internal Node ID: 1
Compute Unit: 96
SIMDs per CU: 2
Shader Engines: 6
Shader Arrs. per Eng.: 2
WatchPts on Addr. Ranges:4
Coherent Host Access: FALSE
Memory Properties:
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 32(0x20)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 32(0x20)
Max Work-item Per CU: 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 262
SDMA engine uCode:: 24
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 25149440(0x17fc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
Size: 25149440(0x17fc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Recommended Granule:0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx1100
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*** Done ***
Additional Information
Here is the compilation error log file
@RandUser123sa Slackware is not a supported OS. Please check the following page for supported OS (https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html). Please re-open the ticket if you are still seeing the issue with any of the supported OS. Thanks!
The only problem in compilation is because the compiler stop on warnings -Werror,-Wunused-parameter, if they are removed the project is compiled successful and yes, I won't replace my favorite operation system just because one package. When I report the problem I expect some one to point me how to resolve it, not a guide how to replace my operation system with one which I don't like.
Hi @RandUser123sa , we can try to get this working despite the unsupported OS. Thanks for your patience.
Can you explain exactly what you are trying to do by building CK with those flags? (e.g. The quickest/minimal build of CK with support for INT4s?).
I note that you're not specifying GPU_TARGETS - is that intentional ? Your use of INSTANCES_ONLY=ON will build for all GPU targets supported by your compiler which will take a lot longer. If you're only compiling for RX7900XT, you can specify "-DGPU_TARGETS=gfx1100" as the GPU target.
Hi @jamesxu2 I'm maintaining a small repository with ROCm 6.2 compiled binary for Slackware users. It's not official, for that reason I did not specific GPU_TARGETS. I remember reading somewhere into ROCm repositories only INT16 and 32 it's safe to build but even I switch off INT4 with USE_BITINT_EXTENSION_INT4=OFF the project was unable to be build. Yes, the compilation take 2 and half days. CK 6.1 was compiled successful, but 6.2 not.
I came to the conclusion, there have 16 compilation warnings like the one above but because compiler use these flags -Werror,-Wunused-parameter the compiler exit with status error on warnings. I try to remove the flags and manual repeat the command and they are compiled without problem. When I write the report I was hoping someone from devs to point me how to change compilation params to not use -Werror,-Wunused-parameter.
Hi @RandUser123sa , thanks for providing that context. I tried this locally and observed that the CK build is prone to crashing under some circumstances including in the case you report. The internal CK team is aware of this, and I'm working with them to figure out the reason for the crash.
Ideally, we'll find the reason that the files causing the -Werror, -Wunused-parameter errors are being built (This is most likely an issue with the build scripts) and find a fix that doesn't require you to manually remove those build flags.
I'm trying to compile version 6.2.0
@RandUser123sa , can you confirm that you're building composable kernel from the /release/rocm-rel-6.2 branch? I notice from your build log that there are only [311] total files which is different from the 783 I observe from the ROCm 6.2 release branch.
@jamesxu2 Yes, I'm using release versions only and I get it with: wget https://github.com/ROCm/composable_kernel/archive/rocm-6.2.0.tar.gz
The reason of different file numbers is because the compilation process take too much time and when I receive the error, I did not start it from beginning i.e. delete the build folder and run cmake again and try manual to compile the file with copy the above command and edit it (removing the -Werror, -Wunused-parameter) and run it again.
Hi @jamesxu2 I found the problem and resolved it. I think this is trivial error. After a close look on file CMakeLists.txt on line 292 contain include(EnableCompilerWarnings) which causing the errors. It's seems this line is for dev not for release versions. I modify with: if(BUILD_DEV) include(EnableCompilerWarnings) endif()
and project is compile successful on Slackware x86_64 current.
This probably it's not the right place for that include, may be the line include(EnableCompilerWarnings) could be moved on line 448 with other includes for development build to look like
SET(BUILD_DEV ON CACHE BOOL "BUILD_DEV") if(BUILD_DEV) add_compile_options(-Werror) add_compile_options(-Weverything) include(EnableCompilerWarnings) endif()
But now I have new issue, I can't build ckProfiler
make -j ckProfiler
make: *** No rule to make target 'ckProfiler'. Stop.
Could you help with this ?
Hi @RandUser123sa , thanks for your investigation on the source of the compilation warnings - I'll bring this up internally.
Regarding the ckProfiler target, your use of the INSTANCES_ONLY directive explicitly removes the profiler as build target. See the Changelog:
"INSTANCES_ONLY" -- Only builds CK library and instances without tests, examples, or profiler
The following command without extraneous flags should show up ckProfiler as a build target:
cmake -DCMAKE_HIP_COMPILER_ROCM_LIB=/opt/rocm/lib -DHIP_LANG=/opt/rocm/lib64 -DCMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/opt/rocm ..
Hi @jamesxu2, just to report you one more problem. When I use cmake -G Ninja for configuration like above and then I try to compile the ckProfiler with make -j ckProfiler, I receive the error: make: *** No rule to make target 'ckProfiler'. Stop., but if I did not use this param -G Ninja the profile is compiled without any problems.
For me the case is closed, if you also think that close the issue.
@RandUser123sa Sure, I'll close the issue.
For some added context - Passing "-G Ninja" to CMake sets the generator to the Ninja build system (which is an incompatible alternative to Make), so CMake won't generate any targets that are buildable via Make. If you remove the "-G Ninja" argument, CMake defaults to using Make as the generator which generates targets for make, including ckProfiler.
If you want, you can still use Ninja, you just need to run "ninja -j ckProfiler" instead of "make -j ckProfiler"
FYI, I was getting the same error on spack for 6.2.0 and 6.2.1. The issue was only present on gfx906 and the build was succesful when I tried on another machine with gfx908. I added a patch to comment out include(EnableCompilerWarnings) on spack: https://github.com/spack/spack/pull/46891
@afzpatel , are you saying you changed the GPU_TARGETS when you changed from your gfx906 machine to gfx908 (in which case, the faulting kernel may not have been built at all) or did you run the same build command on both machines but only encountered the build failure on one of them?
I took a closer look at the faulting area and these "unused parameters" are actually marked with [[maybe_unused]], which should suppress the compiler warning. I'm looking a bit more into why that's happening.
Same build command on both machines, I didn't specify GPU_TARGETS in both cases. Here's the spack build log: composable-kernel-6.2.0-spack-build-fail-out (1).txt
Same build command on both machines, I didn't specify GPU_TARGETS in both cases. Here's the spack build log: composable-kernel-6.2.0-spack-build-fail-out (1).txt
One more thing, this log doesn't BUILD_DEV set to OFF but I've tried that as well and got the same error.
@jamesxu2 it looks like the faulting area is not marked with unused paramaters in the ROCm 6.2.0 release: https://github.com/ROCm/composable_kernel/blob/rocm-6.2.0/include/ck/tensor_operation/gpu/device/impl/device_grouped_conv_fwd_multiple_abd_xdl_cshuffle_v3.hpp#L73
That was added in this commit: https://github.com/ROCm/composable_kernel/commit/959073842c0db839d45d565eb260fd018c996ce4
I'll try backporting this commit onto 6.2.0 in Spack and see if that fixes the issue.