composable_kernel icon indicating copy to clipboard operation
composable_kernel copied to clipboard

[Issue]: 6.2.0 compilation issue

Open RandUser123sa opened this issue 1 year ago • 3 comments

Problem Description

Hello,

I'm trying to compile version 6.2.0 but I receive a error after a while. I configure the project with these parameters: mkdir build && cd build cmake
-Wno-dev
-G Ninja
-D CMAKE_HIP_COMPILER_ROCM_LIB=/opt/rocm/lib
-D HIP_LANG=/opt/rocm/lib64
-D CMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc
-D CMAKE_BUILD_TYPE=Release
-D CMAKE_INSTALL_PREFIX=/opt/rocm
-D BUILD_DEV=OFF
-D INSTANCES_ONLY=ON
-D USE_BITINT_EXTENSION_INT4=ON
-D BUILD_TESTING=OFF
..

ninja -j16

The part of error bellow, the full error log is attached.

Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/CMakeFiles/device_grouped_conv2d_fwd_instance.dir/xdl/comp/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_comp instance.cpp.o FAILED: library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/CMakeFiles/device_grouped_conv2d_fwd_instance.dir/xdl/comp/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_comp_instance.cpp.o /opt/rocm/bin/hipcc -DCK_ENABLE_BF16 -DCK_ENABLE_BF8 -DCK_ENABLE_FP16 -DCK_ENABLE_FP32 -DCK_ENABLE_FP64 -DCK_ENABLE_FP8 -DCK_ENABLE_INT8 -DCK_EXPERIMENTAL_BIT_INT_EXTENSION_INT4 -DCK_USE_WMMA -DCK_USE_XDL -DINSTANCES_ONLY -DUSE_PROF_API=1 -D__HIP_PLATFORM_AMD_=1 -D__HIP_PLATFORM_HCC__=1 -I/mnt/arhiv/rocm/release/composable_kernel-rocm-6.2.0/library/include -I/mnt/arhiv/rocm/release/composable_kernel-rocm -6.2.0/include -I/mnt/arhiv/rocm/rocm-build/build/composable_kernel/include -isystem /opt/rocm/include -O3 -DNDEBUG -std=c++17 -fPIC -Wall -Wextra -Wcomment -Wendif-labels -Wformat -Winit-self -Wreturn- type -Wsequence-point -Wswitch -Wtrigraphs -Wundef -Wuninitialized -Wunreachable-code -Wunused -Wno-reserved-identifier -Werror -Wno-option-ignored -Wsign-compare -Wno-extra-semi-stmt -Wno-unused-template -Wno-missing-field-initializers -Wno-deprecated-declarations -Wall -Wextra -Wcomment -Wendif-labels -Wformat -Winit-self -Wreturn-type -Wsequence-point -Wswitch -Wtrigraphs -Wundef -Wuninitialized -Wunre achable-code -Wunused -Wno-reserved-identifier -Werror -Wno-option-ignored -Wsign-compare -Wno-extra-semi-stmt -Wno-unused-template -Weverything -Wno-c++98-compat -Wno-c++98-compat-pedantic -Wno-conversio n -Wno-double-promotion -Wno-exit-time-destructors -Wno-extra-semi -Wno-float-conversion -Wno-gnu-anonymous-struct -Wno-gnu-zero-variadic-macro-arguments -Wno-missing-prototypes -Wno-nested-anon-types -Wn o-padded -Wno-return-std-move-in-c++11 -Wno-shorten-64-to-32 -Wno-sign-conversion -Wno-unknown-warning-option -Wno-unused-command-line-argument -Wno-weak-vtables -Wno-covered-switch-default -Wno-unsafe-bu ffer-usage -Wno-unused-lambda-capture -Wno-bit-int-extension -Wno-pass-failed -Wno-switch-default -fno-offload-uniform-block -mllvm -enable-post-misched=0 -x hip --offload-arch=gfx1100 --offload-arch=gfx9 08 --offload-arch=gfx90a --offload-arch=gfx940 --offload-arch=gfx941 --offload-arch=gfx942 -MD -MT library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/CMakeFiles/device_grouped_conv2d_fwd_instanc e.dir/xdl/comp/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_comp_instance.cpp.o -MF library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/CMakeFiles/device_grouped_conv2d_fwd_instance.dir/xd l/comp/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_comp_instance.cpp.o.d -o library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/CMakeFiles/device_grouped_conv2d_fwd_instance.dir/xdl/comp/ device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_comp_instance.cpp.o -c /mnt/arhiv/rocm/release/composable_kernel-rocm-6.2.0/library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_comp_instance.cpp In file included from /mnt/arhiv/rocm/release/composable_kernel-rocm-6.2.0/library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/xdl/comp/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_f32_comp_in stance.cpp:5: In file included from /mnt/arhiv/rocm/release/composable_kernel-rocm-6.2.0/library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_comp_instance.hpp:6: /mnt/arhiv/rocm/release/composable_kernel-rocm-6.2.0/include/ck/tensor_operation/gpu/device/impl/device_grouped_conv_fwd_multiple_abd_xdl_cshuffle_v3.hpp:73:74: error: unused parameter 'a_grid_desc_ak0_m ak1' [-Werror,-Wunused-parameter] 73 | const AGridDesc_AK0_M_K1 a_grid_desc_ak0_m_ak1, | ^

Operating System

Slackware 15.0 x86_64

CPU

AMD Ryzen 7 3800X 8-Core Processor

GPU

AMD Radeon RX 7900 XT

Other

No response

ROCm Version

ROCm 6.0.0

ROCm Component

Composable Kernel

Steps to Reproduce

No response

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

ROCk module is loaded

HSA System Attributes

Runtime Version: 1.1 Runtime Ext Version: 1.6 System Timestamp Freq.: 1000.000000MHz Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count) Machine Model: LARGE
System Endianness: LITTLE
Mwaitx: DISABLED DMAbuf Support: YES

==========
HSA Agents


Agent 1


Name: AMD Ryzen 7 3800X 8-Core Processor Uuid: CPU-XX
Marketing Name: AMD Ryzen 7 3800X 8-Core Processor Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 4560
BDFID: 0
Internal Node ID: 0
Compute Unit: 16
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Memory Properties:
Features: None Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 32780980(0x1f432b4) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED Size: 32780980(0x1f432b4) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 32780980(0x1f432b4) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:


Agent 2


Name: gfx1100
Uuid: GPU-6e99eccb20090e4e
Marketing Name: AMD Radeon RX 7900 XTX
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 32(0x20) KB
L2: 6144(0x1800) KB
L3: 98304(0x18000) KB
Chip ID: 29772(0x744c)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2482
BDFID: 3072
Internal Node ID: 1
Compute Unit: 96
SIMDs per CU: 2
Shader Engines: 6
Shader Arrs. per Eng.: 2
WatchPts on Addr. Ranges:4
Coherent Host Access: FALSE
Memory Properties:
Features: KERNEL_DISPATCH Fast F16 Operation: TRUE
Wavefront Size: 32(0x20)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension: x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 32(0x20)
Max Work-item Per CU: 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension: x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 262
SDMA engine uCode:: 24
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 25149440(0x17fc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED Size: 25149440(0x17fc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Recommended Granule:0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx1100
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension: x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension: x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*** Done ***

Additional Information

Here is the compilation error log file

error.log

RandUser123sa avatar Aug 21 '24 05:08 RandUser123sa

@RandUser123sa Slackware is not a supported OS. Please check the following page for supported OS (https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html). Please re-open the ticket if you are still seeing the issue with any of the supported OS. Thanks!

ppanchad-amd avatar Aug 27 '24 17:08 ppanchad-amd

The only problem in compilation is because the compiler stop on warnings -Werror,-Wunused-parameter, if they are removed the project is compiled successful and yes, I won't replace my favorite operation system just because one package. When I report the problem I expect some one to point me how to resolve it, not a guide how to replace my operation system with one which I don't like.

RandUser123sa avatar Sep 20 '24 06:09 RandUser123sa

Hi @RandUser123sa , we can try to get this working despite the unsupported OS. Thanks for your patience.

Can you explain exactly what you are trying to do by building CK with those flags? (e.g. The quickest/minimal build of CK with support for INT4s?).

I note that you're not specifying GPU_TARGETS - is that intentional ? Your use of INSTANCES_ONLY=ON will build for all GPU targets supported by your compiler which will take a lot longer. If you're only compiling for RX7900XT, you can specify "-DGPU_TARGETS=gfx1100" as the GPU target.

jamesxu2 avatar Sep 23 '24 19:09 jamesxu2

Hi @jamesxu2 I'm maintaining a small repository with ROCm 6.2 compiled binary for Slackware users. It's not official, for that reason I did not specific GPU_TARGETS. I remember reading somewhere into ROCm repositories only INT16 and 32 it's safe to build but even I switch off INT4 with USE_BITINT_EXTENSION_INT4=OFF the project was unable to be build. Yes, the compilation take 2 and half days. CK 6.1 was compiled successful, but 6.2 not.

I came to the conclusion, there have 16 compilation warnings like the one above but because compiler use these flags -Werror,-Wunused-parameter the compiler exit with status error on warnings. I try to remove the flags and manual repeat the command and they are compiled without problem. When I write the report I was hoping someone from devs to point me how to change compilation params to not use -Werror,-Wunused-parameter.

RandUser123sa avatar Sep 30 '24 12:09 RandUser123sa

Hi @RandUser123sa , thanks for providing that context. I tried this locally and observed that the CK build is prone to crashing under some circumstances including in the case you report. The internal CK team is aware of this, and I'm working with them to figure out the reason for the crash.

Ideally, we'll find the reason that the files causing the -Werror, -Wunused-parameter errors are being built (This is most likely an issue with the build scripts) and find a fix that doesn't require you to manually remove those build flags.

jamesxu2 avatar Sep 30 '24 13:09 jamesxu2

I'm trying to compile version 6.2.0

@RandUser123sa , can you confirm that you're building composable kernel from the /release/rocm-rel-6.2 branch? I notice from your build log that there are only [311] total files which is different from the 783 I observe from the ROCm 6.2 release branch.

jamesxu2 avatar Sep 30 '24 20:09 jamesxu2

@jamesxu2 Yes, I'm using release versions only and I get it with: wget https://github.com/ROCm/composable_kernel/archive/rocm-6.2.0.tar.gz

The reason of different file numbers is because the compilation process take too much time and when I receive the error, I did not start it from beginning i.e. delete the build folder and run cmake again and try manual to compile the file with copy the above command and edit it (removing the -Werror, -Wunused-parameter) and run it again.

RandUser123sa avatar Oct 01 '24 06:10 RandUser123sa

Hi @jamesxu2 I found the problem and resolved it. I think this is trivial error. After a close look on file CMakeLists.txt on line 292 contain include(EnableCompilerWarnings) which causing the errors. It's seems this line is for dev not for release versions. I modify with: if(BUILD_DEV) include(EnableCompilerWarnings) endif()

and project is compile successful on Slackware x86_64 current.

This probably it's not the right place for that include, may be the line include(EnableCompilerWarnings) could be moved on line 448 with other includes for development build to look like

SET(BUILD_DEV ON CACHE BOOL "BUILD_DEV") if(BUILD_DEV) add_compile_options(-Werror) add_compile_options(-Weverything) include(EnableCompilerWarnings) endif()

But now I have new issue, I can't build ckProfiler

make -j ckProfiler

make: *** No rule to make target 'ckProfiler'. Stop.

Could you help with this ?

RandUser123sa avatar Oct 08 '24 06:10 RandUser123sa

Hi @RandUser123sa , thanks for your investigation on the source of the compilation warnings - I'll bring this up internally.

Regarding the ckProfiler target, your use of the INSTANCES_ONLY directive explicitly removes the profiler as build target. See the Changelog:

"INSTANCES_ONLY" -- Only builds CK library and instances without tests, examples, or profiler

The following command without extraneous flags should show up ckProfiler as a build target:

cmake -DCMAKE_HIP_COMPILER_ROCM_LIB=/opt/rocm/lib  -DHIP_LANG=/opt/rocm/lib64 -DCMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/opt/rocm  ..

jamesxu2 avatar Oct 08 '24 15:10 jamesxu2

Hi @jamesxu2, just to report you one more problem. When I use cmake -G Ninja for configuration like above and then I try to compile the ckProfiler with make -j ckProfiler, I receive the error: make: *** No rule to make target 'ckProfiler'. Stop., but if I did not use this param -G Ninja the profile is compiled without any problems.

For me the case is closed, if you also think that close the issue.

RandUser123sa avatar Oct 08 '24 19:10 RandUser123sa

@RandUser123sa Sure, I'll close the issue.

For some added context - Passing "-G Ninja" to CMake sets the generator to the Ninja build system (which is an incompatible alternative to Make), so CMake won't generate any targets that are buildable via Make. If you remove the "-G Ninja" argument, CMake defaults to using Make as the generator which generates targets for make, including ckProfiler.

If you want, you can still use Ninja, you just need to run "ninja -j ckProfiler" instead of "make -j ckProfiler"

jamesxu2 avatar Oct 08 '24 19:10 jamesxu2

FYI, I was getting the same error on spack for 6.2.0 and 6.2.1. The issue was only present on gfx906 and the build was succesful when I tried on another machine with gfx908. I added a patch to comment out include(EnableCompilerWarnings) on spack: https://github.com/spack/spack/pull/46891

afzpatel avatar Oct 09 '24 18:10 afzpatel

@afzpatel , are you saying you changed the GPU_TARGETS when you changed from your gfx906 machine to gfx908 (in which case, the faulting kernel may not have been built at all) or did you run the same build command on both machines but only encountered the build failure on one of them?

I took a closer look at the faulting area and these "unused parameters" are actually marked with [[maybe_unused]], which should suppress the compiler warning. I'm looking a bit more into why that's happening.

jamesxu2 avatar Oct 09 '24 18:10 jamesxu2

Same build command on both machines, I didn't specify GPU_TARGETS in both cases. Here's the spack build log: composable-kernel-6.2.0-spack-build-fail-out (1).txt

afzpatel avatar Oct 09 '24 18:10 afzpatel

Same build command on both machines, I didn't specify GPU_TARGETS in both cases. Here's the spack build log: composable-kernel-6.2.0-spack-build-fail-out (1).txt

One more thing, this log doesn't BUILD_DEV set to OFF but I've tried that as well and got the same error.

afzpatel avatar Oct 10 '24 13:10 afzpatel

@jamesxu2 it looks like the faulting area is not marked with unused paramaters in the ROCm 6.2.0 release: https://github.com/ROCm/composable_kernel/blob/rocm-6.2.0/include/ck/tensor_operation/gpu/device/impl/device_grouped_conv_fwd_multiple_abd_xdl_cshuffle_v3.hpp#L73

That was added in this commit: https://github.com/ROCm/composable_kernel/commit/959073842c0db839d45d565eb260fd018c996ce4

I'll try backporting this commit onto 6.2.0 in Spack and see if that fixes the issue.

afzpatel avatar Oct 10 '24 13:10 afzpatel