OpenCL-Wrapper icon indicating copy to clipboard operation
OpenCL-Wrapper copied to clipboard

example code not work on arm64 platform Orangepi5(RK3588)

Open jackwei86 opened this issue 1 year ago • 1 comments

(1) Directly using the OpenCL C++ bindings works

arm_release_ver: g13p0-01eac0, rk_so_ver: 10

| Info: Mali-G610 r0p0 |

| Info: OpenCL C code successfully compiled. |

| Info: Value before kernel execution: C[0] = 1.00000000 |

| Info: Value after kernel execution: C[0] = 5.00000000 |


(2) But using the wrapper not work.

Output Info :

arm_release_ver: g13p0-01eac0, rk_so_ver: 10

|----------------.------------------------------------------------------------| | Device ID 0 | Mali-G610 r0p0 | |----------------'------------------------------------------------------------|

|----------------.------------------------------------------------------------| | Device ID | 0 | | Device Name | Mali-G610 r0p0 | | Device Vendor | ARM | | Device Driver | 3.0 (Linux) | | OpenCL Version | OpenCL C 3.0 | | Compute Units | 4 at 1000 MHz (32 cores, 0.064 TFLOPs/s) | | Memory, Cache | 15957 MB RAM, 1024 KB global / 32 KB local | | Buffer Limits | 15957 MB global, 16340404 KB constant | |----------------'------------------------------------------------------------|

| Info: OpenCL C code successfully compiled. |

| Info: Value before kernel execution: C[0] = 1.00000000 |

| Info: Value after kernel execution: C[0] = 1.00000000 |

It seems nothing happened after kernel execution.

The host memory shares to device memory not work.

Memory class `inline void allocate_device_buffer(Device& device, const bool allocate_device, const bool allow_zero_copy) {

 -------
		device_buffer = cl::Buffer( // if(is_zero_copy) { don't allocate extra memory on CPUs/iGPUs } else { allocate VRAM on GPUs }
			device.get_cl_context(),
			CL_MEM_READ_WRITE|((int)is_zero_copy*CL_MEM_USE_HOST_PTR)|((int)device.info.patch_intel_gpu_above_4gb<<23), // for Intel GPUs set flag CL_MEM_ALLOW_UNRESTRICTED_SIZE_INTEL = (1<<23)
			is_zero_copy ? ((capacity()+63ull)/64ull)*64ull : capacity(), // buffer capacity must be a multiple of 64 Bytes for CL_MEM_USE_HOST_PTR
			is_zero_copy ? (void*)host_buffer : nullptr,
			&error
		);

}`

jackwei86 avatar Dec 19 '24 03:12 jackwei86

after using allow_zero_copy = false ,it works. C[0] = 5.00000000 Memory A(device, N,1,true,true,0,false); Memory B(device, N,1,true,true,0,false); Memory C(device, N,1,true,true,0,false);

But when using Memory A(device, N,1); Memory B(device, N,1); Memory C(device, N,1,true,true,0,false);

C[0] = 2.00861049E13

FYR: Do-not-create-buffers-with-CL-MEM-USE-HOST-PTR-if-possible

jackwei86 avatar Dec 19 '24 03:12 jackwei86

Hi @jackwei86,

thanks a lot for reporting this! I have fixed it by disabling zero-copy on ARM iGPUs.

Kind regards, Moritz

ProjectPhysX avatar Sep 14 '25 13:09 ProjectPhysX