MIVisionX
MIVisionX copied to clipboard
[Issue]: OpenVX - AMD Custom Kernels: GPU Failures (HIP & OCL)
Problem Description
The list for GPU kernel failures
HIP
Convolve_S16_U8_3x9.gdf
Convolve_S16_U8_5x5.gdf
Convolve_S16_U8_7x7.gdf
Convolve_S16_U8_ANY_ANY.gdf
Convolve_U8_U8_3x9.gdf
Convolve_U8_U8_5x5.gdf
Convolve_U8_U8_7x7.gdf
Convolve_U8_U8_odd.gdf
OCL
Convolve_S16_U8_9x9.gdf
Convolve_U8_U8_9x9.gdf
Dilate_U1_U1_3x3.gdf
Dilate_U8_U1_3x3.gdf
Erode_U1_U1_3x3.gdf
Erode_U8_U1_3x3.gdf
WarpPerspective_U8_U8_Bilinear.gdf
WarpPerspective_U8_U8_Nearest.gdf
Operating System
ALL
CPU
ANY
GPU
AMD Instinct MI300
Other
No response
ROCm Version
ROCm 6.0.0
ROCm Component
MIVisionX
Steps to Reproduce
Use below GDFs to reproduce errors - runvx gdf
HIP
tests/amd_openvx_gdfs/cpu/hidden/GPU_FAIL_Convolve_S16_U8_3x9.gdf
tests/amd_openvx_gdfs/cpu/hidden/GPU_FAIL_Convolve_S16_U8_5x5.gdf
tests/amd_openvx_gdfs/cpu/hidden/GPU_FAIL_Convolve_S16_U8_7x7.gdf
tests/amd_openvx_gdfs/cpu/hidden/GPU_FAIL_Convolve_S16_U8_odd.gdf
tests/amd_openvx_gdfs/cpu/hidden/GPU_FAIL_Convolve_U8_U8_3x9.gdf
tests/amd_openvx_gdfs/cpu/hidden/GPU_FAIL_Convolve_U8_U8_5x5.gdf
tests/amd_openvx_gdfs/cpu/hidden/GPU_FAIL_Convolve_U8_U8_7x7.gdf
tests/amd_openvx_gdfs/cpu/hidden/GPU_FAIL_Convolve_U8_U8_odd.gdf
OCL
tests/amd_openvx_gdfs/cpu/hidden/GPU_FAILURE_OCL_Convolve_S16_U8_9x9.gdf
tests/amd_openvx_gdfs/cpu/hidden/GPU_FAILURE_OCL_Convolve_U8_U8_9x9.gdf
tests/amd_openvx_gdfs/cpu/hidden/GPU_FAILURE_OCL_Dilate_U1_U1_3x3.gdf
tests/amd_openvx_gdfs/cpu/hidden/GPU_FAILURE_OCL_Dilate_U8_U1_3x3.gdf
tests/amd_openvx_gdfs/cpu/hidden/GPU_FAILURE_OCL_Erode_U1_U1_3x3.gdf
tests/amd_openvx_gdfs/cpu/hidden/GPU_FAILURE_OCL_Erode_U8_U1_3x3.gdf
tests/amd_openvx_gdfs/cpu/hidden/GPU_FAILURE_OCL_WarpPerspective_U8_U8_Bilinear.gdf
tests/amd_openvx_gdfs/cpu/hidden/GPU_FAILURE_OCL_WarpPerspective_U8_U8_Nearest.gdf
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
No response
HIP failure kernels also fail on OCL
Failures: Running GDF - 26:GPU_FAIL_Convolve_S16_U8_3x9.gdf
GPU_FAIL_Convolve_S16_U8_3x9.gdf
runvx 1.0.0
OK: using AMD OpenVX 1.3.0
include GPU_FAIL_Convolve_S16_U8_3x9.gdf
data input_1 = uniform-image:1920,1080,U008,125
data output_1 = image:1920,1080,S016
data input_matrix = convolution:3,9
node org.khronos.openvx.custom_convolution input_1 input_matrix output_1
OK: OpenVX using GPU device - 0: gfx1030 [OpenCL 2.0 ] [CL_DEVICE_SVM_CAPABILITIES 0 0]
# ago graph dump BEGIN [internal]
data input_1 = image-uniform:U008,1920,1080,125
data output_1 = image:S016,1920,1080
data input_matrix = convolution:3,9
node com.amd.openvx.Convolve_S16_U8 output_1 input_1 input_matrix attr:AFFINITY:GPU,1
# ago graph dump END [internal]
csv,HEADER ,STATUS, COUNT,cur-ms,avg-ms,min-ms,clenqueue-ms,clwait-ms,clwrite-ms,clread-ms
Memory access fault by GPU node-1 (Agent handle: 0x3f271dd0) on address 0x77ca5e5ff000. Reason: Page not present or supervisor privilege.
Aborted (core dumped)
Failures: Running GDF - 27:GPU_FAIL_Convolve_S16_U8_5x5.gdf
GPU_FAIL_Convolve_S16_U8_5x5.gdf
runvx 1.0.0
OK: using AMD OpenVX 1.3.0
include GPU_FAIL_Convolve_S16_U8_5x5.gdf
data input_1 = uniform-image:1920,1080,U008,125
data output_1 = image:1920,1080,S016
data input_matrix = convolution:5,5:INIT,{-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;16;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1}
node org.khronos.openvx.custom_convolution input_1 input_matrix output_1
OK: OpenVX using GPU device - 0: gfx1030 [OpenCL 2.0 ] [CL_DEVICE_SVM_CAPABILITIES 0 0]
# ago graph dump BEGIN [internal]
data input_1 = image-uniform:U008,1920,1080,125
data output_1 = image:S016,1920,1080
data input_matrix = convolution:5,5
node com.amd.openvx.Convolve_S16_U8 output_1 input_1 input_matrix attr:AFFINITY:GPU,1
# ago graph dump END [internal]
csv,HEADER ,STATUS, COUNT,cur-ms,avg-ms,min-ms,clenqueue-ms,clwait-ms,clwrite-ms,clread-ms
Memory access fault by GPU node-1 (Agent handle: 0xf31e1d0) on address 0x7960ff5ff000. Reason: Page not present or supervisor privilege.
Aborted (core dumped)
Failures: Running GDF - 28:GPU_FAIL_Convolve_S16_U8_7x7.gdf
GPU_FAIL_Convolve_S16_U8_7x7.gdf
runvx 1.0.0
OK: using AMD OpenVX 1.3.0
include GPU_FAIL_Convolve_S16_U8_7x7.gdf
data input_1 = uniform-image:1920,1080,U008,125
data output_1 = image:1920,1080,S016
data input_matrix = convolution:7,7:INIT,{-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;16;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1}
node org.khronos.openvx.custom_convolution input_1 input_matrix output_1
OK: OpenVX using GPU device - 0: gfx1030 [OpenCL 2.0 ] [CL_DEVICE_SVM_CAPABILITIES 0 0]
# ago graph dump BEGIN [internal]
data input_1 = image-uniform:U008,1920,1080,125
data output_1 = image:S016,1920,1080
data input_matrix = convolution:7,7
node com.amd.openvx.Convolve_S16_U8 output_1 input_1 input_matrix attr:AFFINITY:GPU,1
# ago graph dump END [internal]
csv,HEADER ,STATUS, COUNT,cur-ms,avg-ms,min-ms,clenqueue-ms,clwait-ms,clwrite-ms,clread-ms
Memory access fault by GPU node-1 (Agent handle: 0x1ede0140) on address 0x7312d49ff000. Reason: Page not present or supervisor privilege.
Aborted (core dumped)
Failures: Running GDF - 29:GPU_FAIL_Convolve_S16_U8_odd.gdf
GPU_FAIL_Convolve_S16_U8_odd.gdf
runvx 1.0.0
OK: using AMD OpenVX 1.3.0
include GPU_FAIL_Convolve_S16_U8_odd.gdf
data input_1 = uniform-image:1920,1080,U008,125
data output_1 = image:1920,1080,S016
data input_matrix = convolution:9,7
node org.khronos.openvx.custom_convolution input_1 input_matrix output_1
OK: OpenVX using GPU device - 0: gfx1030 [OpenCL 2.0 ] [CL_DEVICE_SVM_CAPABILITIES 0 0]
# ago graph dump BEGIN [internal]
data input_1 = image-uniform:U008,1920,1080,125
data output_1 = image:S016,1920,1080
data input_matrix = convolution:9,7
node com.amd.openvx.Convolve_S16_U8 output_1 input_1 input_matrix attr:AFFINITY:GPU,1
# ago graph dump END [internal]
csv,HEADER ,STATUS, COUNT,cur-ms,avg-ms,min-ms,clenqueue-ms,clwait-ms,clwrite-ms,clread-ms
Memory access fault by GPU node-1 (Agent handle: 0x26510dd0) on address 0x736e7efff000. Reason: Page not present or supervisor privilege.
Aborted (core dumped)
Failures: Running GDF - 30:GPU_FAIL_Convolve_U8_U8_3x9.gdf
GPU_FAIL_Convolve_U8_U8_3x9.gdf
runvx 1.0.0
OK: using AMD OpenVX 1.3.0
include GPU_FAIL_Convolve_U8_U8_3x9.gdf
data input_1 = uniform-image:1920,1080,U008,125
data output_1 = image:1920,1080,U008
data input_matrix = convolution:3,9
node org.khronos.openvx.custom_convolution input_1 input_matrix output_1
OK: OpenVX using GPU device - 0: gfx1030 [OpenCL 2.0 ] [CL_DEVICE_SVM_CAPABILITIES 0 0]
# ago graph dump BEGIN [internal]
data input_1 = image-uniform:U008,1920,1080,125
data output_1 = image:U008,1920,1080
data input_matrix = convolution:3,9
node com.amd.openvx.Convolve_U8_U8 output_1 input_1 input_matrix attr:AFFINITY:GPU,1
# ago graph dump END [internal]
csv,HEADER ,STATUS, COUNT,cur-ms,avg-ms,min-ms,clenqueue-ms,clwait-ms,clwrite-ms,clread-ms
Memory access fault by GPU node-1 (Agent handle: 0x203d8dc0) on address 0x73fe901ff000. Reason: Page not present or supervisor privilege.
Aborted (core dumped)
Failures: Running GDF - 31:GPU_FAIL_Convolve_U8_U8_5x5.gdf
GPU_FAIL_Convolve_U8_U8_5x5.gdf
runvx 1.0.0
OK: using AMD OpenVX 1.3.0
include GPU_FAIL_Convolve_U8_U8_5x5.gdf
data input_1 = uniform-image:1920,1080,U008,125
data output_1 = image:1920,1080,U008
data input_matrix = convolution:5,5:INIT,{-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;16;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1}
node org.khronos.openvx.custom_convolution input_1 input_matrix output_1
OK: OpenVX using GPU device - 0: gfx1030 [OpenCL 2.0 ] [CL_DEVICE_SVM_CAPABILITIES 0 0]
# ago graph dump BEGIN [internal]
data input_1 = image-uniform:U008,1920,1080,125
data output_1 = image:U008,1920,1080
data input_matrix = convolution:5,5
node com.amd.openvx.Convolve_U8_U8 output_1 input_1 input_matrix attr:AFFINITY:GPU,1
# ago graph dump END [internal]
csv,HEADER ,STATUS, COUNT,cur-ms,avg-ms,min-ms,clenqueue-ms,clwait-ms,clwrite-ms,clread-ms
Memory access fault by GPU node-1 (Agent handle: 0x2a4761c0) on address 0x72351c9ff000. Reason: Page not present or supervisor privilege.
Aborted (core dumped)
Failures: Running GDF - 32:GPU_FAIL_Convolve_U8_U8_7x7.gdf
GPU_FAIL_Convolve_U8_U8_7x7.gdf
runvx 1.0.0
OK: using AMD OpenVX 1.3.0
include GPU_FAIL_Convolve_U8_U8_7x7.gdf
data input_1 = uniform-image:1920,1080,U008,125
data output_1 = image:1920,1080,U008
data input_matrix = convolution:7,7:INIT,{-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;16;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1;-1}
node org.khronos.openvx.custom_convolution input_1 input_matrix output_1
OK: OpenVX using GPU device - 0: gfx1030 [OpenCL 2.0 ] [CL_DEVICE_SVM_CAPABILITIES 0 0]
# ago graph dump BEGIN [internal]
data input_1 = image-uniform:U008,1920,1080,125
data output_1 = image:U008,1920,1080
data input_matrix = convolution:7,7
node com.amd.openvx.Convolve_U8_U8 output_1 input_1 input_matrix attr:AFFINITY:GPU,1
# ago graph dump END [internal]
csv,HEADER ,STATUS, COUNT,cur-ms,avg-ms,min-ms,clenqueue-ms,clwait-ms,clwrite-ms,clread-ms
Memory access fault by GPU node-1 (Agent handle: 0x2426d130) on address 0x75bcef7ff000. Reason: Page not present or supervisor privilege.
Aborted (core dumped)
Failures: Running GDF - 33:GPU_FAIL_Convolve_U8_U8_odd.gdf
GPU_FAIL_Convolve_U8_U8_odd.gdf
runvx 1.0.0
OK: using AMD OpenVX 1.3.0
include GPU_FAIL_Convolve_U8_U8_odd.gdf
data input_1 = uniform-image:1920,1080,U008,125
data output_1 = image:1920,1080,U008
data input_matrix = convolution:9,7
node org.khronos.openvx.custom_convolution input_1 input_matrix output_1
OK: OpenVX using GPU device - 0: gfx1030 [OpenCL 2.0 ] [CL_DEVICE_SVM_CAPABILITIES 0 0]
# ago graph dump BEGIN [internal]
data input_1 = image-uniform:U008,1920,1080,125
data output_1 = image:U008,1920,1080
data input_matrix = convolution:9,7
node com.amd.openvx.Convolve_U8_U8 output_1 input_1 input_matrix attr:AFFINITY:GPU,1
# ago graph dump END [internal]
csv,HEADER ,STATUS, COUNT,cur-ms,avg-ms,min-ms,clenqueue-ms,clwait-ms,clwrite-ms,clread-ms
Memory access fault by GPU node-1 (Agent handle: 0xdfd1dc0) on address 0x7ddca19ff000. Reason: Page not present or supervisor privilege.
Aborted (core dumped)
@AryanSalmanpour : Can you take a look at this issue. It is happening for out of bound mem access for HIP kernels.