MIOpenGEMM
MIOpenGEMM copied to clipboard
[FP32] [miopengemm] GemmBwd1x1_stride1 verficiation failure
GemmBwd1x1_stride1 kernel verficiation failure. kernel writes only 320 values of result to image buffer, rest 4 values always zeros.
If to try use bigger input image like (--spatial_dim_elements 2 18 18) - it will write only 640 elements from the image.
Error only in second path. miopengemm.cpp:108
// chao : there are 2 possible kernel paths for C = alpha * A * B + beta * C in MIOpenGEMM
// library
// 1) kernel_0 : C = alpha * A * B + beta * C
// 2) kernel_1 : C *= beta
// kernel_2 : C += alpha * A * B
bin/test_conv3d --float --cmode conv --pmode valid --group-count 1 --batch_size 8 --input_channels 32 --output_channels 32 --spatial_dim_elements 1 18 18 --filter_dims 1 1 1 --pads_strides_dilations 1 1 1 1 1 1 1 1 1 --trans_output_pads 0 0 0 --in_layout NCDHW --fil_layout NCDHW --out_layout NCDHW --disable-forward --disable-backward-weights
...
MIOpen(OpenCL): Info [get_device_name] Raw device name: gfx1030
MIOpen(OpenCL): Info [Handle] stream: 0x5645663fb6f0, device_id: 0x5645663a3210
MIOpen(OpenCL): Info [BackwardDataGetWorkSpaceSize]
MIOpen(OpenCL): Info [AmdRocmMetadataVersionDetect] ROCm MD version AMDHSA_COv3, HIP version 5.1.22103, MIOpen version 2.18.0.333d4cc77
...
MIOpen(OpenCL): Info [EvaluateInvokers] GemmBwd1x1_stride1: : 0.069959 < 3.40282e+38
MIOpen(OpenCL): Info [EvaluateInvokers] Selected: GemmBwd1x1_stride1: : 0.069959, workspace_sz = 0
MIOpen(OpenCL): Info [Invoke] naive_conv_bwd_ncdhw_float_double_float
MIOpen(OpenCL): Info [EvaluateInvokers] ConvDirectNaiveConvBwd: naive_conv_bwd_ncdhw_float_double_float: 0.16112 < 3.40282e+38
MIOpen(OpenCL): Info [EvaluateInvokers] Selected: ConvDirectNaiveConvBwd: naive_conv_bwd_ncdhw_float_double_float: 0.16112, workspace_sz = 0
MIOpen(OpenCL): Info [FindConvBwdDataAlgorithm] miopenConvolutionBwdDataAlgoGEMM 0.069959 0
MIOpen(OpenCL): Info [FindConvBwdDataAlgorithm] miopenConvolutionBwdDataAlgoDirect 0.16112 0
MIOpen(OpenCL): Info [FindConvBwdDataAlgorithm] BWD Chosen Algorithm: GemmBwd1x1_stride1 , 0, 0.069959
MIOpen(OpenCL): Info [ConvolutionBackwardData] algo = 0, workspace = 0
MIOpen(OpenCL): Info [Invoke] miog_betac
MIOpen(OpenCL): Info [Invoke] miog_alphaab
MIOpen(OpenCL): Info [Invoke] miog_betac
MIOpen(OpenCL): Info [Invoke] miog_alphaab
MIOpen(OpenCL): Info [Invoke] miog_betac
MIOpen(OpenCL): Info [Invoke] miog_alphaab
MIOpen(OpenCL): Info [Invoke] miog_betac
MIOpen(OpenCL): Info [Invoke] miog_alphaab
MIOpen(OpenCL): Info [Invoke] miog_betac
MIOpen(OpenCL): Info [Invoke] miog_alphaab
MIOpen(OpenCL): Info [Invoke] miog_betac
MIOpen(OpenCL): Info [Invoke] miog_alphaab
MIOpen(OpenCL): Info [Invoke] miog_betac
MIOpen(OpenCL): Info [Invoke] miog_alphaab
MIOpen(OpenCL): Info [Invoke] miog_betac
MIOpen(OpenCL): Info [Invoke] miog_alphaab
MIOpen(OpenCL): Info [Invoke] miog_betac
MIOpen(OpenCL): Info [Invoke] miog_alphaab
MIOpen(OpenCL): Info [Invoke] miog_betac
MIOpen(OpenCL): Info [Invoke] miog_alphaab
MIOpen(OpenCL): Info [GetBackwardSolutions]
MIOpen(OpenCL): Info [Measure] RamDb::Prefetch time: 0.02082 ms
check_in_dev4=-267 160 0 0 0 0 256 96 -232 0 0 0 0 -541 -176 85 0 0 0 0 237
bin/test_conv3d --float --cmode conv --pmode valid --group-count 1 --disable-forward --disable-backward-weights --batch_size 10 --input_channels 32 --output_channels 32 --spatial_dim_elements 1 18 18 --filter_dims 1 1 1 --pads_strides_dilations 1 1 1 1 1 1 1 1 1 --trans_output_pads 0 0 0 --in_layout NCDHW --fil_layout NCDHW --out_layout NCDHW
FAILED: 0.0619375
Iteration: 0
Backward convolution: GemmBwd1x1_stride1
Input tensor: 10, 32, 1, 18, 18
Weights tensor: 32, 32, 1, 1, 1
Output tensor: 10, 32, 1, 18, 18
Filter: conv3d, miopenConvolution, miopenPaddingValid, {0, 0, 0}, {1, 1, 1}, {1, 1, 1},
Max diff: 2.50575e+06
Mismatch at 320: -187 != 0
auto check_in_dev4 = handle.Read<T>(in_dev, 1000);
std::cout <<"check_in_dev4=" ;
std::copy(check_in_dev4.begin()+318, check_in_dev4.begin()+325, std::ostream_iterator<T>(std::cout, " "));
std::copy(check_in_dev4.begin()+324+318, check_in_dev4.begin()+324+325, std::ostream_iterator<T>(std::cout, " "));
std::copy(check_in_dev4.begin()+648+318, check_in_dev4.begin()+648+325, std::ostream_iterator<T>(std::cout, " "));
std::cout << '\n';
rinput.data = handle.Read<T>(in_dev, rinput.data.size());
return rinput;
affected PR ROCmSoftwarePlatform/MIOpen#1503