MIOpen icon indicating copy to clipboard operation
MIOpen copied to clipboard

[iGemm] Some ConvHipImplicitGemm* solvers have very slow IsApplicable(): 5000...10000 us

Open atamazov opened this issue 4 years ago • 2 comments

Some of the Solvers have exceptionally high execution time of IsApplicable(). This happens because IsApplicable() call HeuristicInit(), which, in turn, is time consuming.

The list of solvers:

  • ConvHipImplicitGemmForwardV4R4Xdlops
  • ConvHipImplicitGemmForwardV4R4Xdlops_Padded_Gemm
  • ConvHipImplicitGemmForwardV4R5Xdlops
  • ConvHipImplicitGemmWrwV4R4Xdlops
  • ConvHipImplicitGemmWrwV4R4Xdlops_Padded_Gemm

Log of GWSS and interpretation

Notice that the issue affects not only GWSS but much more functionality of the library. GWSS is simply convenient for time measurements of IsApplicable() by means of standard library's means.

Full log that shows the case during GetWorkspaceSize
# MIOPEN_FIND_MODE=normal \
G_IMPLICIT_GEMM_FIND_ALL_SOLUTIONS=1 \
MIOPEN_LOG_LEVEL=6 \
MIOPEN_ENABLE_LOGGING_ELAPSED_> MIOPEN_DEBUG_IMPLICIT_GEMM_FIND_ALL_SOLUTIONS=1 \
> MIOPEN_LOG_LEVEL=6 \
> MIOPEN_ENABLE_LOGGING_ELAPSED_TIME=1 \
> ./bin/MIOpenDriver conv -x 3 -y 3 -W 17 -H 17 -c 192 -n 128 -k 192 -p 0 -q 0 -u 2 -v 2 -l 1 -j 1 -g 1 -s 0 \
> -w 1 -t 1 -F 1 -i 1 -V 0
MIOpenDriver conv -x 3 -y 3 -W 17 -H 17 -c 192 -n 128 -k 192 -p 0 -q 0 -u 2 -v 2 -l 1 -j 1 -g 1 -s 0 -w 1 -t 1 -F 1 -i 1 -V 0
MIOpen(HIP)   0.001: Info [get_device_name] Raw device name: gfx90a:sramecc+:xnack-
MIOpen(HIP)   0.097: Info [Handle] stream: 0xb9f650, device_id: 0
MIOpen(HIP)   0.051: Info [GetFindModeValueImpl] MIOPEN_FIND_MODE = NORMAL(1)
MIOpen(HIP)   0.508: Info [ForwardGetWorkSpaceSize]
MIOpen(HIP)   0.053: Info2 [HipCompilerVersionImpl] Read version information from HIP package...
MIOpen(HIP)   0.020: Info [HipCompilerVersionImpl] 4.3.21331
MIOpen(HIP)   0.005: Info [AmdRocmMetadataVersionDetect] ROCm MD version AMDHSA_COv3, MIOpen version 2.15.0.3e1d4b080
MIOpen(HIP)   0.052: Info2 [ValidateGcnAssemblerImpl] Running: '/opt/rocm/llvm/bin/clang --version'
MIOpen(HIP)   6.781: Info2 [ValidateGcnAssemblerImpl] clang version 13.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-4.3.1 21313 286c48af238c2d3a24ebc5a06ea5191f333eaed0)
MIOpen(HIP)   0.016: Info2 [ValidateGcnAssemblerImpl] Target: x86_64-unknown-linux-gnu
MIOpen(HIP)   0.008: Info2 [ValidateGcnAssemblerImpl] Thread model: posix
MIOpen(HIP)   0.006: Info2 [ValidateGcnAssemblerImpl] InstalledDir: /opt/rocm/llvm/bin
MIOpen(HIP)   0.005: Info2 [ValidateGcnAssemblerImpl]
MIOpen(HIP)   0.061: Info2 [GetWorkspaceSize] ConvBinWinograd3x3U: Not applicable
MIOpen(HIP)   0.029: Info2 [GetWorkspaceSize] ConvBinWinogradRxSf3x2: Not applicable
MIOpen(HIP)   0.012: Info2 [GetWorkspaceSize] ConvBinWinogradRxSf2x3: Not applicable
MIOpen(HIP)   0.021: Info2 [GetWorkspaceSize] ConvBinWinogradRxSf2x3g1: 0
MIOpen(HIP)   0.014: Info2 [GetWorkspaceSize] ConvBinWinogradRxS: Not applicable
MIOpen(HIP)   0.014: Info2 [GetWorkspaceSize] ConvMPBidirectWinograd<3-3>: Not applicable
MIOpen(HIP)   0.011: Info2 [GetWorkspaceSize] ConvMPBidirectWinograd<4-3>: Not applicable
MIOpen(HIP)   0.016: Info2 [GetWorkspaceSize] ConvMPBidirectWinograd<5-3>: Not applicable
MIOpen(HIP)   0.010: Info2 [GetWorkspaceSize] ConvMPBidirectWinograd<6-3>: Not applicable
MIOpen(HIP)   0.011: Info2 [GetWorkspaceSize] ConvMPBidirectWinograd_xdlops<2-3>: Not applicable
MIOpen(HIP)   0.011: Info2 [GetWorkspaceSize] ConvMPBidirectWinograd_xdlops<3-3>: Not applicable
MIOpen(HIP)   0.017: Info2 [GetWorkspaceSize] ConvMPBidirectWinograd_xdlops<4-3>: Not applicable
MIOpen(HIP)   0.009: Info2 [GetWorkspaceSize] ConvMPBidirectWinograd_xdlops<5-3>: Not applicable
MIOpen(HIP)   0.012: Info2 [GetWorkspaceSize] ConvMPBidirectWinograd_xdlops<6-3>: Not applicable
MIOpen(HIP)   0.021: Info2 [GetWorkspaceSize] ConvAsm3x3U: Not applicable
MIOpen(HIP)   0.012: Info2 [GetWorkspaceSize] ConvAsm1x1U: Not applicable
MIOpen(HIP)   0.014: Info2 [GetWorkspaceSize] ConvAsm1x1UV2: Not applicable
MIOpen(HIP)   0.012: Info2 [GetWorkspaceSize] ConvAsm5x10u2v2f1: Not applicable
MIOpen(HIP)   0.009: Info2 [GetWorkspaceSize] ConvAsm7x7c3h224w224k64u2v2p3q3f1: Not applicable
MIOpen(HIP)   0.010: Info2 [GetWorkspaceSize] ConvAsm5x10u2v2b1: Not applicable
MIOpen(HIP)   0.011: Info2 [GetWorkspaceSize] ConvOclDirectFwd11x11: Not applicable
MIOpen(HIP)   0.010: Info2 [GetWorkspaceSize] ConvOclDirectFwdGen: Not applicable
MIOpen(HIP)   0.012: Info2 [GetWorkspaceSize] ConvOclDirectFwd1x1: Not applicable
MIOpen(HIP)   0.013: Info2 [GetPerformanceConfig] Returns: 16,16,32,32,2,2,8,2,1
MIOpen(HIP)   0.009: Info2 [GetWorkspaceSize] ConvOclDirectFwd: 0
MIOpen(HIP)   0.011: Info2 [GetWorkspaceSize] ConvDirectNaiveConvFwd: 0
MIOpen(HIP)   0.012: Info2 [GetWorkspaceSize] ConvDirectNaiveConvBwd: Not applicable
MIOpen(HIP)   0.010: Info2 [GetWorkspaceSize] ConvDirectNaiveConvWrw: Not applicable
MIOpen(HIP)   5.562: Info [HeuristicInit] 64,256,2,64,128,4,0,1,8
MIOpen(HIP)   0.014: Info2 [GetWorkspaceSize] ConvHipImplicitGemmForwardV4R5Xdlops: 0
MIOpen(HIP)  10.943: Info [HeuristicInit] 64,256,4,64,64,4,0,1,4
MIOpen(HIP)   0.013: Info2 [GetWorkspaceSize] ConvHipImplicitGemmForwardV4R4Xdlops: 0
MIOpen(HIP)   0.012: Info2 [GetWorkspaceSize] ConvHipImplicitGemmForwardV4R4Xdlops_Padded_Gemm: Not applicable
MIOpen(HIP)   0.009: Info2 [GetWorkspaceSize] ConvHipImplicitGemmBwdDataV4R1Xdlops: Not applicable
MIOpen(HIP)   0.010: Info2 [GetWorkspaceSize] ConvHipImplicitGemmBwdDataV1R1Xdlops: Not applicable
MIOpen(HIP)   0.009: Info2 [GetWorkspaceSize] ConvHipImplicitGemmV4R1Fwd: 0
MIOpen(HIP)   0.010: Info2 [GetWorkspaceSize] ConvHipImplicitGemmV4R4Fwd: 0
MIOpen(HIP)   0.008: Info2 [GetWorkspaceSize] ConvMlirIgemmFwdXdlops: Not applicable
MIOpen(HIP)   0.009: Info2 [GetWorkspaceSize] ConvMlirIgemmFwd: Not applicable
MIOpen(HIP)   0.012: Info2 [GetWorkspaceSize] ConvMlirIgemmBwdXdlops: Not applicable
MIOpen(HIP)   0.009: Info2 [GetWorkspaceSize] ConvMlirIgemmBwd: Not applicable
MIOpen(HIP)   0.010: Info2 [GetWorkspaceSize] ConvHipImplicitGemmBwdDataV1R1: Not applicable
MIOpen(HIP)   0.010: Info2 [GetWorkspaceSize] ConvHipImplicitGemmBwdDataV4R1: Not applicable
MIOpen(HIP)   0.010: Info2 [GetWorkspaceSize] ConvAsmImplicitGemmV4R1DynamicFwd_1x1: Not applicable
MIOpen(HIP)   0.012: Info2 [GetWorkspaceSize] ConvAsmImplicitGemmV4R1DynamicFwd: Not applicable
MIOpen(HIP)   0.010: Info2 [GetWorkspaceSize] ConvAsmImplicitGemmV4R1DynamicBwd: Not applicable
MIOpen(HIP)   0.014: Info2 [GetWorkspaceSize] ConvAsmImplicitGemmGTCDynamicFwdXdlops: Not applicable
MIOpen(HIP)   0.037: Info2 [GetWorkspaceSize] ConvAsmImplicitGemmGTCDynamicBwdXdlops: Not applicable
MIOpen(HIP)   0.027: Info2 [GetWorkspaceSize] ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC: 36028416
MIOpen(HIP)   0.011: Info2 [GetWorkspaceSize] ConvAsmImplicitGemmGTCDynamicBwdXdlopsNHWC: Not applicable
MIOpen(HIP)   0.020: Info2 [GetWorkspaceSize] ConvCkIgemmFwdV6r1DlopsNchw: 4096
MIOpen(HIP)   0.014: Info2 [ForwardBackwardGetWorkSpaceSizeImplicitGemm] 0 < 36028416
MIOpen(HIP)   0.019: Info2 [GetWorkspaceSize] GemmFwd1x1_0_1: Not applicable
MIOpen(HIP)   0.010: Info2 [GetWorkspaceSize] GemmFwd1x1_0_1_int8: Not applicable
MIOpen(HIP)   0.007: Info2 [GetWorkspaceSize] GemmFwd1x1_0_2: Not applicable
MIOpen(HIP)   0.021: Info2 [GetWorkspaceSize] GemmFwdRest: 442368
MIOpen(HIP)   0.011: Info2 [GetWorkspaceSize] GemmBwd1x1_stride1: Not applicable
MIOpen(HIP)   0.008: Info2 [GetWorkspaceSize] GemmBwd1x1_stride2: Not applicable
MIOpen(HIP)   0.007: Info2 [GetWorkspaceSize] GemmBwdRest: Not applicable
MIOpen(HIP)   0.012: Info2 [GetWorkspaceSize] GemmWrw1x1_stride1: Not applicable
MIOpen(HIP)   0.007: Info2 [GetWorkspaceSize] GemmWrwUniversal: Not applicable
MIOpen(HIP)   0.012: Info2 [GetWorkspaceSize] fft: Not applicable
MIOpen(HIP)   0.009: Info2 [ForwardGetWorkSpaceSize] 36028416
...
Wall-clock Time Forward Conv. Elapsed: 0.379714 ms, Auxiliary API calls: 47072.9 ms (GWSS: 24.3008)

Excerpt from the log:

MIOpen(HIP)   0.011: Info2 [GetWorkspaceSize] ConvDirectNaiveConvFwd: 0
MIOpen(HIP)   0.012: Info2 [GetWorkspaceSize] ConvDirectNaiveConvBwd: Not applicable
MIOpen(HIP)   0.010: Info2 [GetWorkspaceSize] ConvDirectNaiveConvWrw: Not applicable
MIOpen(HIP)   5.562: Info [HeuristicInit] 64,256,2,64,128,4,0,1,8
MIOpen(HIP)   0.014: Info2 [GetWorkspaceSize] ConvHipImplicitGemmForwardV4R5Xdlops: 0
MIOpen(HIP)  10.943: Info [HeuristicInit] 64,256,4,64,64,4,0,1,4
MIOpen(HIP)   0.013: Info2 [GetWorkspaceSize] ConvHipImplicitGemmForwardV4R4Xdlops: 0
MIOpen(HIP)   0.012: Info2 [GetWorkspaceSize] ConvHipImplicitGemmForwardV4R4Xdlops_Padded_Gemm: Not applicable

Interpretation:

IsApplicable for ConvDirectNaiveConvBwd() takes 12 us, for ConvDirectNaiveConvWrw - 10 us, for ConvHipImplicitGemmForwardV4R5Xdlops - 5576 us (5562 + 14), for ConvHipImplicitGemmForwardV4R4Xdlops - 10956 (10943 + 13), for ConvHipImplicitGemmForwardV4R4Xdlops_Padded_Gemm - 12 us.


[Informative] Related mail thread: "The performance of IsApplicable() of some ConvHipImplicitGemm*Xdlops solvers"

[Informative] Related issue: #1062

CC @yiqian1 @junliume

atamazov avatar Nov 19 '21 16:11 atamazov

@atamazov Has this been resolved? Thanks!

ppanchad-amd avatar Apr 15 '24 18:04 ppanchad-amd

@ppanchad-amd This stuff is aged so some investigation is required to answer.

atamazov avatar Apr 17 '24 21:04 atamazov