deepdetect DeepDetect with Apple Silicon ARM M1 support via NCNN with Vulkan

trafficstars

This PR adds Vulkan support to DeepDetect for Apple ARM M1 with NCNN.

At the moment:

PR relies on #1103
Vulkan support is Apple ARM M1 only, will be extended to other GPUs, including Nvidia, as they are already supported by Vulkan
M1 GPU performance is ~30% faster than M1 CPU (~20ms for a SqueezeNet-SSD call vs 30ms, including image transform, etc...)
PR contains a USE_OPENMP flag since Apple does not support OpenMP (!!)

To be added/fixed:

[x] A APPLE_M1 flag to isolate dedicated configuration (e.g. home-brew stuff)
[x] Support OpenMP on M1
[ ] Small report on performances
[ ] It seems the Vulkan SDK cannot be downloaded directly with curl/wget due to some JS code in the way, and we are mirroring the SDK at the moment, so that download can happen from cmake directly.

To build for Apple ARM M1:

Requirements (may not be exhaustive):

brew install lmdb boost eigen rapidjson libarchive spdlog curlpp utf8cpp gflags

!! Make sure you are using the arm64 homebrew with support for M1, as suggested here: https://github.com/Homebrew/discussions/discussions/149#discussioncomment-132932 This PR expects the arm64 homebrew to be installed in /opt/homebrew/

Build DD

cmake .. -DUSE_BOOST_BACKTRACE=OFF -DUSE_CPU_ONLY=ON -DUSE_HTTP_SERVER=OFF \
-DUSE_HTTP_SERVER_OATPP=ON -DUSE_CAFFE=OFF -DUSE_NCNN=ON -DWARNING=OFF \
-DAPPLE_M1=ON -DUSE_VULKAN=ON -DUSE_BOOST_BACKTRACE=OFF -DUSE_HDF5=OFF \
-DBUILD_TESTS=ON -DBUILD_SPDLOG=ON

Dec 29 '20 10:12 beniz

@beniz Apple's Clang/LLVM does support OpenMP, and it shows good scaling on the M1. Apple does not provide the OpenMP library so you must either

When compiling with Clang/LLVM remember to add -Xclang -fopenmp to CPPFLAGS, add -lomp to LIBS so a minimal compile might be

g++ -o hello -Xclang -fopenmp hellomp.c -lomp

Jan 02 '21 21:01 neurolabusc

Ah thanks @neurolabusc I had stumbled onto https://github.com/Tencent/ncnn/blob/54c0a13b9fe062141ae8d10b49a7bda0829a012a/.github/workflows/release.yml#L165 that includes building openmp as a step for NCNN Mac OS builds.

I will try when I have a moment, and have updated the list of bullet points accordingly. In all cases this should close the gap between CPU and GPU on the M1 with batch size 1. Actually, larger batch sizes with NCNN rely on OpenMP AFAIK and thus may not benefit from true parallelism at GPU level.

Jan 03 '21 18:01 beniz

This pull request is now in conflict :(

Jan 04 '21 15:01 mergify[bot]

This pull request is now in conflict :(

Jan 06 '21 14:01 mergify[bot]

This pull request is now in conflict :(

Jan 09 '21 10:01 mergify[bot]

@mergifyio update

Jan 13 '21 08:01 beniz

Command update: success

Branch has been successfully updated

Jan 13 '21 08:01 mergify[bot]

This pull request is now in conflict :(

Jan 13 '21 23:01 mergify[bot]

This pull request is now in conflict :(

Jan 27 '21 18:01 mergify[bot]

This pull request is now in conflict :(

Mar 08 '21 17:03 mergify[bot]

This pull request is now in conflict :(

Sep 26 '22 10:09 mergify[bot]

update

❌ Base branch update has failed

merge conflict between base and head err-code: 35C31

Mar 16 '23 15:03 mergify[bot]

This pull request is now in conflict :(

Mar 16 '23 15:03 mergify[bot]

brew install pkg-config opencv@4 libomp
# for pytorch
python3 -m pip install typing_extensions pyyaml

May 03 '23 14:05 Bycob

This pull request is now in conflict :(

May 15 '23 07:05 mergify[bot]

This pull request is now in conflict :(

May 24 '23 13:05 mergify[bot]

This pull request is now in conflict :(

Jun 30 '23 15:06 mergify[bot]

This pull request is now in conflict :(

Jan 08 '24 09:01 mergify[bot]

deepdetect deepdetect copied to clipboard

DeepDetect with Apple Silicon ARM M1 support via NCNN with Vulkan

❌ Base branch update has failed

deepdetect
deepdetect copied to clipboard