ck_gemm_softmax_gemm failing CI tests
The CK hipRTC test from the Jenkins file is failing in CI and blocking some merges. I've also seen it pass on the same hardware (when restarting the CI job, but using the same server). The failure always prints the same results.
[2024-05-03T22:35:00.871Z] 339/354 Test #341: test_verify_gemm ..........................................................***Failed Error regular expression found in output. Regex=[FAILED]2289.93 sec [2024-05-03T22:35:00.871Z] [ RUN ] batch_quant_dot_1<int8_t, int32_t> [2024-05-03T22:35:00.871Z] [ COMPLETE ] batch_quant_dot_1<int8_t, int32_t> (5436.89ms) [2024-05-03T22:35:00.871Z] [ RUN ] batch_quant_dot_1<migraphx::fp8::fp8e4m3fnuz, float> [2024-05-03T22:35:00.871Z] [ COMPLETE ] batch_quant_dot_1<migraphx::fp8::fp8e4m3fnuz, float> (1.30443ms) [2024-05-03T22:35:00.871Z] [ RUN ] batch_quant_dot_2<int8_t, int32_t> [2024-05-03T22:35:00.871Z] [ COMPLETE ] batch_quant_dot_2<int8_t, int32_t> (693.872ms) [2024-05-03T22:35:00.871Z] [ RUN ] batch_quant_dot_2<migraphx::fp8::fp8e4m3fnuz, float> [2024-05-03T22:35:00.871Z] [ COMPLETE ] batch_quant_dot_2<migraphx::fp8::fp8e4m3fnuz, float> (736.263ms) [2024-05-03T22:35:00.871Z] [ RUN ] batch_quant_dot_3migraphx::shape::int8_type [2024-05-03T22:35:00.871Z] [ COMPLETE ] batch_quant_dot_3migraphx::shape::int8_type (61.6461ms) [2024-05-03T22:35:00.871Z] [ RUN ] batch_quant_dot_3migraphx::shape::fp8e4m3fnuz_type [2024-05-03T22:35:00.871Z] [ COMPLETE ] batch_quant_dot_3migraphx::shape::fp8e4m3fnuz_type (747.364ms) [2024-05-03T22:35:00.871Z] [ RUN ] batch_quant_dot_4migraphx::shape::int8_type [2024-05-03T22:35:00.871Z] [ COMPLETE ] batch_quant_dot_4migraphx::shape::int8_type (955.169ms) [2024-05-03T22:35:00.871Z] [ RUN ] batch_quant_dot_4migraphx::shape::fp8e4m3fnuz_type [2024-05-03T22:35:00.871Z] [ COMPLETE ] batch_quant_dot_4migraphx::shape::fp8e4m3fnuz_type (834.423ms) [2024-05-03T22:35:00.871Z] [ RUN ] batch_quant_dot_5migraphx::shape::int8_type [2024-05-03T22:35:00.871Z] [ COMPLETE ] batch_quant_dot_5migraphx::shape::int8_type (729.565ms) [2024-05-03T22:35:00.871Z] [ RUN ] batch_quant_dot_5migraphx::shape::fp8e4m3fnuz_type [2024-05-03T22:35:00.871Z] [ COMPLETE ] batch_quant_dot_5migraphx::shape::fp8e4m3fnuz_type (815.728ms) [2024-05-03T22:35:00.871Z] [ RUN ] ck_gemm_softmax_gemm [2024-05-03T22:35:00.871Z] FAILED: gpu [2024-05-03T22:35:00.871Z] RMS Error: 0.117354 [2024-05-03T22:35:00.871Z] Max diff: 1.09686 [2024-05-03T22:35:00.871Z] Mismatch at 0: -0.0474854 != -0.0221405 [2024-05-03T22:35:00.871Z] [2024-05-03T22:35:00.871Z] module: "main" [2024-05-03T22:35:00.871Z] @0 = @literal{ ... } -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0 [2024-05-03T22:35:00.871Z] @1 = @literal{ ... } -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0 [2024-05-03T22:35:00.871Z] 3 = @param:3 -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0 [2024-05-03T22:35:00.871Z] 2 = @param:2 -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0 [2024-05-03T22:35:00.871Z] 1 = @param:1 -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0 [2024-05-03T22:35:00.871Z] @5 = transposepermutation={0, 1, 3, 2} -> half_type, {1, 12, 256, 256}, {786432, 65536, 1, 256}, target_id=0 [2024-05-03T22:35:00.871Z] @6 = dot(1,@5) -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0 [2024-05-03T22:35:00.871Z] @7 = mul(@6,@1) -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0 [2024-05-03T22:35:00.871Z] @8 = add(@7,@0) -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0 [2024-05-03T22:35:00.871Z] @9 = softmaxaxis=-1 -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0 [2024-05-03T22:35:00.871Z] @10 = dot(@9,3) -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0 [2024-05-03T22:35:00.871Z] [2024-05-03T22:35:00.871Z] [2024-05-03T22:35:00.871Z] ref: [2024-05-03T22:35:00.871Z] module: "main" [2024-05-03T22:35:00.871Z] @0 = @literal{ ... } -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0 [2024-05-03T22:35:00.871Z] @1 = @literal{ ... } -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0 [2024-05-03T22:35:00.871Z] 3 = @param:3 -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0 [2024-05-03T22:35:00.871Z] 2 = @param:2 -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0 [2024-05-03T22:35:00.871Z] 1 = @param:1 -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0 [2024-05-03T22:35:00.871Z] @5 = ref::transposepermutation={0, 1, 3, 2} -> half_type, {1, 12, 256, 256}, {786432, 65536, 1, 256}, target_id=0 [2024-05-03T22:35:00.871Z] @6 = ref::contiguous(@5) -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0 [2024-05-03T22:35:00.871Z] @7 = ref::dot(1,@6) -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0 [2024-05-03T22:35:00.871Z] @8 = ref::mul(@7,@1) -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0 [2024-05-03T22:35:00.871Z] @9 = ref::add(@8,@0) -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0 [2024-05-03T22:35:00.871Z] @10 = ref::softmaxaxis=3 -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0 [2024-05-03T22:35:00.871Z] @11 = ref::dot(@10,3) -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0 [2024-05-03T22:35:00.871Z] [2024-05-03T22:35:00.871Z] [2024-05-03T22:35:00.871Z] gpu: [2024-05-03T22:35:00.871Z] module: "main" [2024-05-03T22:35:00.871Z] @0 = check_context::migraphx::gpu::context -> float_type, {}, {}, target_id=0 [2024-05-03T22:35:00.871Z] 2 = @param:2 -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0 [2024-05-03T22:35:00.871Z] output = @param:output -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0 [2024-05-03T22:35:00.871Z] 3 = @param:3 -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0 [2024-05-03T22:35:00.871Z] 1 = @param:1 -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0 [2024-05-03T22:35:00.871Z] @5 = transposepermutation={0, 1, 3, 2} -> half_type, {1, 12, 256, 256}, {786432, 65536, 1, 256}, target_id=0 [2024-05-03T22:35:00.872Z] @6 = gpu::code_objectcode_object=59464,symbol_name=ck_gemm_softmax_gemm_kernel,global=12288,local=256, -> half_type, {1, 12, 256, 256}, {786432, 65536, 256, 1}, target_id=0 [2024-05-03T22:35:00.872Z] [2024-05-03T22:35:00.872Z] [2024-05-03T22:35:00.872Z] [2024-05-03T22:35:00.872Z] void run_verify::verify(const program_info &) const [2024-05-03T22:35:00.872Z] /home/jenkins/workspace/AMDMIGraphX_PR-3003@2/test/verify/run_verify.cpp:254: [2024-05-03T22:35:00.872Z] FAILED: passed [ 0 ] [2024-05-03T22:35:00.872Z] [ FAILED ] ck_gemm_softmax_gemm (47679.9ms): Test failure