example code not work on arm64 platform Orangepi5(RK3588)
(1) Directly using the OpenCL C++ bindings works
arm_release_ver: g13p0-01eac0, rk_so_ver: 10
| Info: Mali-G610 r0p0 |
| Info: OpenCL C code successfully compiled. |
| Info: Value before kernel execution: C[0] = 1.00000000 |
| Info: Value after kernel execution: C[0] = 5.00000000 |
(2) But using the wrapper not work.
Output Info :
arm_release_ver: g13p0-01eac0, rk_so_ver: 10
|----------------.------------------------------------------------------------| | Device ID 0 | Mali-G610 r0p0 | |----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------| | Device ID | 0 | | Device Name | Mali-G610 r0p0 | | Device Vendor | ARM | | Device Driver | 3.0 (Linux) | | OpenCL Version | OpenCL C 3.0 | | Compute Units | 4 at 1000 MHz (32 cores, 0.064 TFLOPs/s) | | Memory, Cache | 15957 MB RAM, 1024 KB global / 32 KB local | | Buffer Limits | 15957 MB global, 16340404 KB constant | |----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled. |
| Info: Value before kernel execution: C[0] = 1.00000000 |
| Info: Value after kernel execution: C[0] = 1.00000000 |
It seems nothing happened after kernel execution.
The host memory shares to device memory not work.
Memory class `inline void allocate_device_buffer(Device& device, const bool allocate_device, const bool allow_zero_copy) {
-------
device_buffer = cl::Buffer( // if(is_zero_copy) { don't allocate extra memory on CPUs/iGPUs } else { allocate VRAM on GPUs }
device.get_cl_context(),
CL_MEM_READ_WRITE|((int)is_zero_copy*CL_MEM_USE_HOST_PTR)|((int)device.info.patch_intel_gpu_above_4gb<<23), // for Intel GPUs set flag CL_MEM_ALLOW_UNRESTRICTED_SIZE_INTEL = (1<<23)
is_zero_copy ? ((capacity()+63ull)/64ull)*64ull : capacity(), // buffer capacity must be a multiple of 64 Bytes for CL_MEM_USE_HOST_PTR
is_zero_copy ? (void*)host_buffer : nullptr,
&error
);
}`
after using allow_zero_copy = false ,it works. C[0] = 5.00000000
Memory
But when using
Memory
C[0] = 2.00861049E13
FYR: Do-not-create-buffers-with-CL-MEM-USE-HOST-PTR-if-possible
Hi @jackwei86,
thanks a lot for reporting this! I have fixed it by disabling zero-copy on ARM iGPUs.
Kind regards, Moritz