[CI] failed: SmokeSolverConvHipImplicitGemmBwdDataV4R1/Conv2dTuningV4R1BWDFloat.FloatTest_smoke_solver_ConvHipImplicitGemmBwdDataV4R1/0
Byproduct of https://github.com/ROCm/MIOpen/pull/3181
CI log (access may be restricted)
What worries us is the line:
MIOpen(HIP): Warning [BuildHip] /tmp/comgr-8efef0/input/static_kernel_gridwise_convolution_backward_data_implicit_gemm_v4r1_nchw_kcyx_nkhw.cpp:6:60: error: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in 'gridwise_convolution_backward_data_implicit_gemm_v4r1_nchw_kcyx_nkhw': desired occupancy was 2, final occupancy is 1 [-Werror,-Wpass-failed]
and
[2024-08-07T23:05:21.071Z] FAILED: /home/jenkins/workspace/MLLIBS_MIOpen_update_ck_0805/src/ocl/convolutionocl.cpp:904: No suitable algorithm was found to execute the required convolution
Should we suppress it or fix it?
[2024-08-07T23:05:21.071Z] [ RUN ] SmokeSolverConvHipImplicitGemmBwdDataV4R1/Conv2dTuningV4R1BWDFloat.FloatTest_smoke_solver_ConvHipImplicitGemmBwdDataV4R1/0
[2024-08-07T23:05:21.071Z] unnamed --float --verbose --disable-forward --disable-backward-weights --input 16 64 16 16 --weights 64 64 3 3 --pads_strides_dilations 0 0 1 1 1 1
[2024-08-07T23:05:21.071Z] PRNG seed: 12345678
[2024-08-07T23:05:21.071Z] unnamed --float --cmode conv --pmode default --group-count 1 --disable-forward --disable-backward-weights --input 16 64 16 16 --weights 64 64 3 3 --batch_size 16 --input_channels 64 --output_channels 64 --spatial_dim_elements 16 16 --filter_dims 3 3 --pads_strides_dilations 0 0 1 1 1 1 --trans_output_pads 0 0 --in_layout NCHW --fil_layout NCHW --out_layout NCHW --deterministic 0 --tensor_vect 0 --vector_length 1 --output_type int32 --int8_vectorize 0
[2024-08-07T23:05:21.071Z] unnamed --float --cmode conv --pmode default --group-count 1 --disable-forward --disable-backward-weights --input 16 64 16 16 --weights 64 64 3 3 --batch_size 16 --input_channels 64 --output_channels 64 --spatial_dim_elements 16 16 --filter_dims 3 3 --pads_strides_dilations 0 0 1 1 1 1 --trans_output_pads 0 0 --in_layout NCHW --fil_layout NCHW --out_layout NCHW --deterministic 0 --tensor_vect 0 --vector_length 1 --output_type int32 --int8_vectorize 0
[2024-08-07T23:05:21.071Z] FAILED: /home/jenkins/workspace/MLLIBS_MIOpen_update_ck_0805/src/ocl/convolutionocl.cpp:904: No suitable algorithm was found to execute the required convolution
[2024-08-07T23:05:21.071Z] Backward convolution:
[2024-08-07T23:05:21.071Z] Input tensor: 16, 64, 16, 16
[2024-08-07T23:05:21.071Z] Weights tensor: 64, 64, 3, 3
[2024-08-07T23:05:21.071Z] Output tensor: 16, 64, 14, 14
[2024-08-07T23:05:21.071Z] Filter: conv2d, miopenConvolution, miopenPaddingDefault, {0, 0}, {1, 1}, {1, 1},
[2024-08-07T23:05:21.071Z] /home/jenkins/workspace/MLLIBS_MIOpen_update_ck_0805/test/gtest/gtest_common.hpp:46: Failure
[2024-08-07T23:05:21.071Z] Value of: err.find("Error") != std::string::npos || err.find("failed") != std::string::npos
[2024-08-07T23:05:21.071Z] Actual: true
[2024-08-07T23:05:21.071Z] Expected: false
[2024-08-07T23:05:21.071Z]
[2024-08-07T23:05:21.071Z] MIOpen(HIP): Warning [FindSolutionImpl] Perf Db: load skipped: ConvHipImplicitGemmBwdDataV4R1, enforce: SEARCH_DB_UPDATE(4)
[2024-08-07T23:05:21.071Z] MIOpen(HIP): Warning [GetAllConfigs] ConvHipImplicitGemmBwdDataV4R1: Searching the best solution among 24...
[2024-08-07T23:05:21.071Z] MIOpen(HIP): Warning [Monitor] 0/0/5 0.103487, best within recent 1: 0.103487 #0 64,32,32,8,2,2, ETA:0 sec.
[2024-08-07T23:05:21.071Z] MIOpen(HIP): Warning [Monitor] 1/0/5 0.103487, best within recent 1: 0.14272 #1 256,64,128,4,2,4, ETA:42.0482 sec.
[2024-08-07T23:05:21.071Z] MIOpen(HIP): Warning [Monitor] 2/0/5 0.103487, best within recent 1: 0.11376 #2 64,32,32,4,2,2, ETA:21.8606 sec.
[2024-08-07T23:05:21.071Z] MIOpen(HIP): Warning [Monitor] 3/0/5 0.0885118, best within recent 1: 0.0885118 #3 256,64,64,8,2,2, ETA:12.4697 sec.
[2024-08-07T23:05:21.071Z] MIOpen(HIP): Warning [GenericSearch] Done: 5/0/5, best #3 0.0885118 256,64,64,8,2,2
[2024-08-07T23:05:21.071Z] MIOpen(HIP): Error [Compile] 'hiprtcCompileProgram(prog.get(), c_options.size(), c_options.data())' static_kernel_gridwise_convolution_backward_data_implicit_gemm_v4r1_nchw_kcyx_nkhw.cpp: HIPRTC_ERROR_COMPILATION (6)
[2024-08-07T23:05:21.071Z] MIOpen(HIP): Error [BuildHip] HIPRTC status = HIPRTC_ERROR_COMPILATION (6), source file: static_kernel_gridwise_convolution_backward_data_implicit_gemm_v4r1_nchw_kcyx_nkhw.cpp
[2024-08-07T23:05:21.071Z] MIOpen(HIP): Warning [BuildHip] /tmp/comgr-8efef0/input/static_kernel_gridwise_convolution_backward_data_implicit_gemm_v4r1_nchw_kcyx_nkhw.cpp:6:60: error: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in 'gridwise_convolution_backward_data_implicit_gemm_v4r1_nchw_kcyx_nkhw': desired occupancy was 2, final occupancy is 1 [-Werror,-Wpass-failed]
[2024-08-07T23:05:21.071Z] 6 | __launch_bounds__(CK_PARAM_TUNABLE_BLOCK_SIZE, 2) void gridwise_convolution_backward_data_implicit_gemm_v4r1_nchw_kcyx_nkhw(
[2024-08-07T23:05:21.071Z] | ^
[2024-08-07T23:05:21.071Z] 1 error generated when compiling for gfx90a.
[2024-08-07T23:05:21.071Z] MIOpen(HIP): Error [FindSolutionImpl] Search failed for: ConvHipImplicitGemmBwdDataV4R1: /home/jenkins/workspace/MLLIBS_MIOpen_update_ck_0805/src/hipoc/hipoc_program.cpp:296: Code object build failed. Source: static_kernel_gridwise_convolution_backward_data_implicit_gemm_v4r1_nchw_kcyx_nkhw.cpp
[2024-08-07T23:05:21.071Z] [ FAILED ] SmokeSolverConvHipImplicitGemmBwdDataV4R1/Conv2dTuningV4R1BWDFloat.FloatTest_smoke_solver_ConvHipImplicitGemmBwdDataV4R1/0, where GetParam() = { (((0, 0x2104a7 pointing to "SEARCH_DB_UPDATE"), (1, 5), (0, true), (0, 0x21179d pointing to "normal"), (0, 0x21065a pointing to "ConvHipImplicitGemmBwdDataV4R1")), " --verbose --disable-forward --disable-backward-weights --input 16 64 16 16 --weights 64 64 3 3 --pads_strides_dilations 0 0 1 1 1 1") } (25734 ms)
[2024-08-07T23:05:21.071Z] [----------] 1 test from SmokeSolverConvHipImplicitGemmBwdDataV4R1/Conv2dTuningV4R1BWDFloat (25734 ms total)
[2024-08-07T23:05:21.071Z]
[2024-08-07T23:05:21.071Z] [----------] Global test environment tear-down
[2024-08-07T23:05:21.071Z] [==========] 1 test from 1 test suite ran. (25734 ms total)
[2024-08-07T23:05:21.071Z] [ PASSED ] 0 tests.
[2024-08-07T23:05:21.071Z] [ FAILED ] 1 test, listed below:
[2024-08-07T23:05:21.071Z] [ FAILED ] SmokeSolverConvHipImplicitGemmBwdDataV4R1/Conv2dTuningV4R1BWDFloat.FloatTest_smoke_solver_ConvHipImplicitGemmBwdDataV4R1/0, where GetParam() = { (((0, 0x2104a7 pointing to "SEARCH_DB_UPDATE"), (1, 5), (0, true), (0, 0x21179d pointing to "normal"), (0, 0x21065a pointing to "ConvHipImplicitGemmBwdDataV4R1")), " --verbose --disable-forward --disable-backward-weights --input 16 64 16 16 --weights 64 64 3 3 --pads_strides_dilations 0 0 1 1 1 1") }
[2024-08-07T23:05:21.071Z]
[2024-08-07T23:05:21.071Z] 1 FAILED TEST
[Notice] @junliume Commit 654489fe4aaf501b76e7a95f8eb3aaf4334322c4 looks good, but it might be worth considering capping the W/A by ROCm 6.3. If that fails again, we can extend W/A to 6.4 and so on. Why: The W/A affects all HIP kernels, but I suspect that kernel developers would like to know cases where the compiler fails to follow the __launch_bounds__ hints. I think this wide W/A is acceptable for 6.2 where the compiler has some known pecularities, but AFAICU we expect that 6.3 will be in a better shape.
Refs:
- https://rocm.docs.amd.com/projects/HIP/en/develop/reference/kernel_language.html#launch-bounds
This issue has been migrated to: https://github.com/ROCm/rocm-libraries/issues/871
Imported to ROCm/rocm-libraries