MIOpenGEMM icon indicating copy to clipboard operation
MIOpenGEMM copied to clipboard

[FP32] [miopengemm] GemmBwd1x1_stride1 verficiation failure

Open shurale-nkn opened this issue 3 years ago • 0 comments

GemmBwd1x1_stride1 kernel verficiation failure. kernel writes only 320 values of result to image buffer, rest 4 values always zeros.

If to try use bigger input image like (--spatial_dim_elements 2 18 18) - it will write only 640 elements from the image.

Error only in second path. miopengemm.cpp:108

// chao : there are 2 possible kernel paths for C = alpha * A * B + beta * C in MIOpenGEMM
    // library
    //   1) kernel_0 : C = alpha * A * B + beta * C
    //   2) kernel_1 : C *= beta
    //      kernel_2 : C += alpha * A * B
bin/test_conv3d --float --cmode conv --pmode valid --group-count 1 --batch_size 8 --input_channels 32 --output_channels 32 --spatial_dim_elements 1 18 18 --filter_dims 1 1 1 --pads_strides_dilations 1 1 1 1 1 1 1 1 1 --trans_output_pads 0 0 0 --in_layout NCDHW --fil_layout NCDHW --out_layout NCDHW --disable-forward --disable-backward-weights
...
MIOpen(OpenCL): Info [get_device_name] Raw device name: gfx1030
MIOpen(OpenCL): Info [Handle] stream: 0x5645663fb6f0, device_id: 0x5645663a3210
MIOpen(OpenCL): Info [BackwardDataGetWorkSpaceSize] 
MIOpen(OpenCL): Info [AmdRocmMetadataVersionDetect] ROCm MD version AMDHSA_COv3, HIP version 5.1.22103, MIOpen version 2.18.0.333d4cc77
...
MIOpen(OpenCL): Info [EvaluateInvokers] GemmBwd1x1_stride1: : 0.069959 < 3.40282e+38
MIOpen(OpenCL): Info [EvaluateInvokers] Selected: GemmBwd1x1_stride1: : 0.069959, workspace_sz = 0
MIOpen(OpenCL): Info [Invoke] naive_conv_bwd_ncdhw_float_double_float
MIOpen(OpenCL): Info [EvaluateInvokers] ConvDirectNaiveConvBwd: naive_conv_bwd_ncdhw_float_double_float: 0.16112 < 3.40282e+38
MIOpen(OpenCL): Info [EvaluateInvokers] Selected: ConvDirectNaiveConvBwd: naive_conv_bwd_ncdhw_float_double_float: 0.16112, workspace_sz = 0
MIOpen(OpenCL): Info [FindConvBwdDataAlgorithm] miopenConvolutionBwdDataAlgoGEMM        0.069959        0
MIOpen(OpenCL): Info [FindConvBwdDataAlgorithm] miopenConvolutionBwdDataAlgoDirect      0.16112 0
MIOpen(OpenCL): Info [FindConvBwdDataAlgorithm] BWD Chosen Algorithm: GemmBwd1x1_stride1 , 0, 0.069959
MIOpen(OpenCL): Info [ConvolutionBackwardData] algo = 0, workspace = 0
MIOpen(OpenCL): Info [Invoke] miog_betac
MIOpen(OpenCL): Info [Invoke] miog_alphaab
MIOpen(OpenCL): Info [Invoke] miog_betac
MIOpen(OpenCL): Info [Invoke] miog_alphaab
MIOpen(OpenCL): Info [Invoke] miog_betac
MIOpen(OpenCL): Info [Invoke] miog_alphaab
MIOpen(OpenCL): Info [Invoke] miog_betac
MIOpen(OpenCL): Info [Invoke] miog_alphaab
MIOpen(OpenCL): Info [Invoke] miog_betac
MIOpen(OpenCL): Info [Invoke] miog_alphaab
MIOpen(OpenCL): Info [Invoke] miog_betac
MIOpen(OpenCL): Info [Invoke] miog_alphaab
MIOpen(OpenCL): Info [Invoke] miog_betac
MIOpen(OpenCL): Info [Invoke] miog_alphaab
MIOpen(OpenCL): Info [Invoke] miog_betac
MIOpen(OpenCL): Info [Invoke] miog_alphaab
MIOpen(OpenCL): Info [Invoke] miog_betac
MIOpen(OpenCL): Info [Invoke] miog_alphaab
MIOpen(OpenCL): Info [Invoke] miog_betac
MIOpen(OpenCL): Info [Invoke] miog_alphaab
MIOpen(OpenCL): Info [GetBackwardSolutions] 
MIOpen(OpenCL): Info [Measure] RamDb::Prefetch time: 0.02082 ms
check_in_dev4=-267 160 0 0 0 0 256 96 -232 0 0 0 0 -541 -176 85 0 0 0 0 237 
bin/test_conv3d --float --cmode conv --pmode valid --group-count 1 --disable-forward --disable-backward-weights --batch_size 10 --input_channels 32 --output_channels 32 --spatial_dim_elements 1 18 18 --filter_dims 1 1 1 --pads_strides_dilations 1 1 1 1 1 1 1 1 1 --trans_output_pads 0 0 0 --in_layout NCDHW --fil_layout NCDHW --out_layout NCDHW 
FAILED: 0.0619375
Iteration: 0
Backward convolution: GemmBwd1x1_stride1
Input tensor: 10, 32, 1, 18, 18
Weights tensor: 32, 32, 1, 1, 1
Output tensor: 10, 32, 1, 18, 18
Filter: conv3d, miopenConvolution, miopenPaddingValid, {0, 0, 0}, {1, 1, 1}, {1, 1, 1}, 
Max diff: 2.50575e+06
Mismatch at 320: -187 != 0
        auto check_in_dev4 = handle.Read<T>(in_dev, 1000);
        std::cout <<"check_in_dev4=" ;
        std::copy(check_in_dev4.begin()+318, check_in_dev4.begin()+325, std::ostream_iterator<T>(std::cout, " "));
        std::copy(check_in_dev4.begin()+324+318, check_in_dev4.begin()+324+325, std::ostream_iterator<T>(std::cout, " "));
        std::copy(check_in_dev4.begin()+648+318, check_in_dev4.begin()+648+325, std::ostream_iterator<T>(std::cout, " "));
        std::cout << '\n';
        rinput.data = handle.Read<T>(in_dev, rinput.data.size());
        return rinput;

affected PR ROCmSoftwarePlatform/MIOpen#1503

shurale-nkn avatar Aug 04 '22 18:08 shurale-nkn