VC4CL icon indicating copy to clipboard operation
VC4CL copied to clipboard

OpenCL for OpenCV on Pi

Open spinoza1791 opened this issue 6 years ago • 38 comments

Has anyone been able to run OpenCV on the Pi GPU using OpenCL? Is there an example somewhere demonstrating accessing the GPU with OpenCV using OpenCL?

spinoza1791 avatar Apr 10 '18 01:04 spinoza1791

Being able to run OpenCV is one of the goals of this project. But at this stage is is neither tested nor expected to work.

doe300 avatar Apr 12 '18 11:04 doe300

will you announce here once VC4CL is compatible with OpenCV?

On Thu, Apr 12, 2018, 6:13 AM doe300 [email protected] wrote:

Being able to run OpenCV is one of the goals of this project. But at this stage is is neither tested nor expected to work.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/doe300/VC4CL/issues/29#issuecomment-380768777, or mute the thread https://github.com/notifications/unsubscribe-auth/AZX8mNItkjhmfWvfeYKGc1hCBsRLxRjtks5tnzbBgaJpZM4TNdeX .

spinoza1791 avatar Apr 12 '18 13:04 spinoza1791

its a shame, it seems the GPU will help the maker comunity alot but no results so far anywhere. Damm Nvidia taking all the AI, i purchase the Movidious stick while i wait for this to happen, let see how it goes once i get it.

masterchop avatar Jun 19 '18 05:06 masterchop

What about Caffe? Caffe is currently the only deep learning library to properly support OpenCL 1.2. If this can be made to work with Caffe, itll really be very powerful for AI work

soulslicer avatar Jul 19 '18 04:07 soulslicer

OpenCL-enabled OpenCV is located in my Github; https://github.com/thortex/rpi3-opencv/ https://github.com/thortex/rpi3-opencv/releases/tag/v3.4.2-opencl

I'm testing original test cases provided by OpenCV; https://github.com/thortex/rpi3-vc4cl/ https://github.com/thortex/rpi3-vc4cl/releases https://github.com/thortex/rpi3-vc4cl/tree/master/test/opencv

There are 16,182 tests from 132 test cases. I've run 2,865 test, got 1,555 NGs, and 1,308 OKs.

opencv-opencl-test.zip

thortex avatar Jul 22 '18 05:07 thortex

There are 16,182 tests from 132 test cases. I've run 2,865 test, got 1,555 NGs, and 1,308 OKs.

That sounds promising. For the failed tests, how good are the outputs? Are they any good to debug issues in VC4CL?

doe300 avatar Jul 22 '18 07:07 doe300

Fantastic Thor! My goal is to run OpenCV on Pi against the GPU, while also running a small YOLO model against the Pi CPU (NNPack).

This way I can spread the load, so that video processing and NN do not compete for the same processor.

Do you think this is possible with your release?

On Sun, Jul 22, 2018 at 12:19 AM, Thor Watanabe [email protected] wrote:

OpenCL-enabled OpenCV is located in my Github; https://github.com/thortex/rpi3-opencv/ https://github.com/thortex/rpi3-opencv/releases/tag/v3.4.2-opencl

I'm testing original test cases provided by OpenCV; https://github.com/thortex/rpi3-vc4cl/ https://github.com/thortex/rpi3-vc4cl/releases https://github.com/thortex/rpi3-vc4cl/tree/master/test/opencv

There are 16,182 tests from 132 test cases. I've run 2,865 test, got 1,555 NGs, and 1,308 OKs.

opencv-opencl-test.zip https://github.com/doe300/VC4CL/files/2216741/opencv-opencl-test.zip

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/doe300/VC4CL/issues/29#issuecomment-406841951, or mute the thread https://github.com/notifications/unsubscribe-auth/AZX8mG9GUdClA33c6z3kjXICNUGNpEG-ks5uJAt5gaJpZM4TNdeX .

spinoza1791 avatar Jul 22 '18 15:07 spinoza1791

log.arith.zip

I attached OpenCL Arithmetic test result of OpenCV 3.4.2 above.

Warning/Error summary is listed below:

  • Warning: OpenCV uses clGetProgramInfo API to save OpenCL program cache.
[ WARN:0] Can't save OpenCL binary into cache: /root/.cache/opencv/3.4.2/opencl_cache/32-bit--Broadcom--VideoCore_IV_GPU--0_4/core--lut_02217d060320fc126306ad16885be711.bin
OpenCV(3.4.2) /home/pi/rpi3-opencv/setup/opencv-3.4.2/modules/core/src/ocl.cpp:3752: error: (-220:Unknown error code -220) OpenCL error CL_INVALID_VALUE (-30) during call: clGetProgramInfo(handle, CL_PROGRAM_BINARIES, sizeof(ptr), &ptr, NULL) in function 'getProgramBinary'
  • Line#264: 64-bit operations are not supported by the VideoCore IV architecture, ...
  • Line#1103: ./modules/core/test/ocl/test_arithm.cpp:179: Failure (Add/Subtract/Multiply)
  • Line#7243: MinMax test.

If VC4CL supports clGetProgramInfo(), it's good to execute OpenCL tests for shortening processing time.

thortex avatar Jul 28 '18 06:07 thortex

Hi spinoza1791 , I think it won't work yet. We have to debug VC4CL properly to run OpenCV in OpenCL mode.

thortex avatar Jul 28 '18 06:07 thortex

Thanks @thortex for testing this and the logs.

A few quick comments on the log:

  • VC4CL only supports 64-bit types in a very limited scope (when statically convertible to 32-bit types). Idk, if you can deactivate 64-bit types from the OpenCV tests, but these tests will probably never pass.
  • The warning with the CL_INVALID_VALUE in clGetProgramInfo I will have to look into it. Looks like the VC4CL does something wrong there.
  • Even if it works, at least at the moment I would not enable caching, since currently the compiler (and the code generated) is the thing which changes the most in VC4CL.

doe300 avatar Jul 28 '18 08:07 doe300

Thanks @doe300,

I checked the following fails:

Line#1103: ./modules/core/test/ocl/test_arithm.cpp:179: Failure (Add/Subtract/Multiply)

And I found failure patterns:

  • signed 8-bit data type matrix operation is OK, but unsigned 8-bit is failed.
  • signed 16-bit data type matrix operation is OK, but unsigned 16-bit is failed.
  • float64 data type matrix operation is OK, but float32 is failed.

an example result of unsigned 8-bit data type matrix Add operation is:

[128  30   7     [ 56  32  55    [255 255 255
  20  15   4   +   89  55  12  =  255 255 255
  50  25   5]      11  89  98]    255 255 255]

The expected result is:

[128  30   7     [ 56  32  55    [184  62  62
  20  15   4   +   89  55  12  =  109  70  16
  50  25   5]      11  89  98]     61 114 103]

I don't know why, but VC4CL returns 0xFF for all result of operation.

thortex avatar Aug 02 '18 20:08 thortex

That's some interesting results, I will have to look into it. What is the output for unsigned 16-bit integers? Are all elements set to 0xFFFF?

64-bit floating point test cannot pass, it is probably just skipped.

doe300 avatar Aug 03 '18 05:08 doe300

mini.zip

I attached mini.zip (including six test results for matrix add arithmetic operation).

CPU only: Matrix src1 + Matrix src2 = Matrix dst1
OpenCL (VC4CL): Matrix usrc1 + Matrix usrc2 = Matrix udst1

What is the output for unsigned 16-bit integers? Are all elements set to 0xFFFF?

It's 65535 (0xFFFF) described in line #722 of mini.log.

mini-debug.log includes VC4CL debugging outputs.

thortex avatar Aug 04 '18 18:08 thortex

I found and fixed a bug in conversion with saturation and the failing tests now succeed.

doe300 avatar Aug 11 '18 12:08 doe300

Thanks doe300!!!

I'll also check other tests with bedb33c8d6241bab60e9ca3954b20faf8fbf7af3.

thortex avatar Aug 11 '18 12:08 thortex

So does this mean that its possible to run some OpenCV functions with OpenCL accelerations? Do you have any minimal benchmark of a single function with/without OpenCL acceleration to give us an idea of a possible speedup? I am very curious about it!

julled avatar Aug 29 '18 21:08 julled

@doe300 and @thortex , is it feasible results could be slower? I've been using the opencv_perf_imgproc for comparisons when looking at 32bit pi vs 64bit pi, so I thought I'd fire this up again with opencl using the work here (which is really great + interesting - thanks alot). I took opencv from @thortex repo, and then rebuilt so I got the perf binaries (on latest raspbian 32bit OS).

There's a few crashes but on the test subsets that run OK, so far I see some times coming in similar, but some quite alot (e.g. factor of 10) slower.

vb216 avatar Sep 08 '18 16:09 vb216

Yes it is possible. As mentioned in various other posts, memory access (esp. write access) is a bottleneck. Although there are some optimizations left to be done.

doe300 avatar Sep 09 '18 01:09 doe300

If I may ask, what is the current status of opencv support? Do most operations work? Has anyone tried it with the DNN module? I've got MobileNetV2 running with ~.2 sec inference time using optimized cpu based opencv. I would be interested to see if it could go faster using the GPU.

charlesrwest avatar Nov 28 '18 02:11 charlesrwest

I don't have any progress testing OpenCV. The problem is that it is hard to test and not very suitable for debugging issues with wrongly generated code.

doe300 avatar Nov 28 '18 06:11 doe300

@spinoza1791 @doe300 I've successfully compiled Latest OpenCV(4.0.1 - dev) with VC4CL OpenCL with no whatsoever compilation error. I have included VC4CL OpenCL during OpenCV compilation and also with FFmpeg build. I'll update here with benchmark results.

abhiTronix avatar Feb 05 '19 05:02 abhiTronix

Great! Looking forward to seeing benchmarks for Pi!

On Mon, Feb 4, 2019 at 11:40 PM Abhishek Thakur [email protected] wrote:

@spinoza1791 https://github.com/spinoza1791 @doe300 https://github.com/doe300 I've successfully compiled Latest OpenCV(4.0.1 - dev) with VC4CL OpenCL with no whatsoever compilation error. I have included VC4CL OpenCL during OpenCV compilation and also with FFmpeg build. I'll update here with benchmark results.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/doe300/VC4CL/issues/29#issuecomment-460519801, or mute the thread https://github.com/notifications/unsubscribe-auth/AZX8mEaaU9ERGb8z0yOw0uEq-t0sY5nnks5vKRlGgaJpZM4TNdeX .

spinoza1791 avatar Feb 05 '19 06:02 spinoza1791

Also updated: https://github.com/thortex/rpi3-opencv/releases/tag/v4.0.1

thortex avatar Feb 05 '19 22:02 thortex

Perfect. U da man!

On Tue, Feb 5, 2019 at 4:19 PM Thor Watanabe [email protected] wrote:

Also updated: https://github.com/thortex/rpi3-opencv/releases/tag/v4.0.1

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/doe300/VC4CL/issues/29#issuecomment-460826560, or mute the thread https://github.com/notifications/unsubscribe-auth/AZX8mCl39HxjodW3MRTl8-ajVzzLlyM9ks5vKgNtgaJpZM4TNdeX .

spinoza1791 avatar Feb 05 '19 22:02 spinoza1791

Also updated: https://github.com/thortex/rpi3-opencv/releases/tag/v4.0.1

@thortex Looking at your build script, I think you're using OpenCV's inbuilt OpenCL module, not one provided by this repo. Check your cmake output again for confirmation Or check print(cv2.getBuildInformation()) output.

abhiTronix avatar Feb 06 '19 06:02 abhiTronix

@abhiTronix I used the dynamic load feature of OpenCV.

thortex avatar Feb 07 '19 20:02 thortex

I successfully installed your OpenCL version of OpenCV on Pi 3B+ via https://github.com/thortex/rpi3-opencv/releases/tag/v4.0.1. What else is needed to install VC4CL and test?

spinoza1791 avatar Feb 08 '19 03:02 spinoza1791

@thortex I'm referring to the script on your GitHub repo. OpenCV prioritizes inbuilt libraries over System Libraries, so they have to be manually linked with OpenCV to make them work. VC4CL can't be dynamically linked with OpenCV directly unless path specified/linked at runtime.

abhiTronix avatar Feb 08 '19 03:02 abhiTronix

@abhiTronix, thanks for your reviewing. I added ICD OpenCL library dependency in https://github.com/thortex/rpi3-opencv/commit/d35998fcdbac12de17856c82b7204656b2631a94 This release depends on https://github.com/thortex/rpi3-vc4cl/

thortex avatar Feb 09 '19 05:02 thortex

Thanks for the updates to your repo - I compiled and tried to execute the performance test. Without running as sudo it won't run the opencl extensions which is a handy way to test the difference for me.

Anyway, with sudo, I noticed first few tests of the perf_imgproc run go OK, but then I start getting these kernel messages, and the process seems to hang.

I'm not pushing for a solution, just incase the info helps anyones work/findings. FYI its a Pi 3B+, 4.14.79-v7+ #1159 SMP Sun Nov 4 17:50:20 GMT 2018 armv7l GNU/Linux

[37353.721913] INFO: task kworker/0:2:1865 blocked for more than 120 seconds. [37353.721921] Tainted: G C 4.14.79-v7+ #1159 [37353.721923] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [37353.721927] kworker/0:2 D 0 1865 2 0x00000000 [37353.721943] Workqueue: events get_throttled_poll [37353.721961] [<8079ef70>] (__schedule) from [<8079f5d8>] (schedule+0x50/0xa8) [37353.721969] [<8079f5d8>] (schedule) from [<8079fa50>] (schedule_preempt_disabled+0x18/0x1c) [37353.721978] [<8079fa50>] (schedule_preempt_disabled) from [<807a1358>] (__mutex_lock.constprop.3+0x190/0x58c) [37353.721986] [<807a1358>] (__mutex_lock.constprop.3) from [<807a1870>] (__mutex_lock_slowpath+0x1c/0x20) [37353.721994] [<807a1870>] (__mutex_lock_slowpath) from [<807a18d0>] (mutex_lock+0x5c/0x60) [37353.722002] [<807a18d0>] (mutex_lock) from [<8063cdd0>] (rpi_firmware_transaction+0x44/0xac) [37353.722012] [<8063cdd0>] (rpi_firmware_transaction) from [<8063cf30>] (rpi_firmware_property_list+0xf8/0x208) [37353.722019] [<8063cf30>] (rpi_firmware_property_list) from [<8063d0a4>] (rpi_firmware_property+0x64/0x84) [37353.722027] [<8063d0a4>] (rpi_firmware_property) from [<8063d278>] (rpi_firmware_get_throttled+0x124/0x214) [37353.722035] [<8063d278>] (rpi_firmware_get_throttled) from [<8063d3fc>] (get_throttled_poll+0x28/0x54) [37353.722043] [<8063d3fc>] (get_throttled_poll) from [<801379b4>] (process_one_work+0x158/0x454) [37353.722050] [<801379b4>] (process_one_work) from [<80137d14>] (worker_thread+0x64/0x5b8) [37353.722057] [<80137d14>] (worker_thread) from [<8013dd98>] (kthread+0x13c/0x16c) [37353.722066] [<8013dd98>] (kthread) from [<801080ac>] (ret_from_fork+0x14/0x28)

vb216 avatar Feb 11 '19 09:02 vb216