deepdetect
deepdetect copied to clipboard
DeepDetect with Apple Silicon ARM M1 support via NCNN with Vulkan
This PR adds Vulkan support to DeepDetect for Apple ARM M1 with NCNN.
At the moment:
- PR relies on #1103
- Vulkan support is Apple ARM M1 only, will be extended to other GPUs, including Nvidia, as they are already supported by Vulkan
- M1 GPU performance is ~30% faster than M1 CPU (~20ms for a SqueezeNet-SSD call vs 30ms, including image transform, etc...)
- PR contains a
USE_OPENMPflag since Apple does not support OpenMP (!!)
To be added/fixed:
- [x] A APPLE_M1 flag to isolate dedicated configuration (e.g. home-brew stuff)
- [x] Support OpenMP on M1
- [ ] Small report on performances
- [ ] It seems the Vulkan SDK cannot be downloaded directly with curl/wget due to some JS code in the way, and we are mirroring the SDK at the moment, so that download can happen from
cmakedirectly.
To build for Apple ARM M1:
- Requirements (may not be exhaustive):
brew install lmdb boost eigen rapidjson libarchive spdlog curlpp utf8cpp gflags
!! Make sure you are using the arm64 homebrew with support for M1, as suggested here: https://github.com/Homebrew/discussions/discussions/149#discussioncomment-132932
This PR expects the arm64 homebrew to be installed in /opt/homebrew/
- Build DD
cmake .. -DUSE_BOOST_BACKTRACE=OFF -DUSE_CPU_ONLY=ON -DUSE_HTTP_SERVER=OFF \
-DUSE_HTTP_SERVER_OATPP=ON -DUSE_CAFFE=OFF -DUSE_NCNN=ON -DWARNING=OFF \
-DAPPLE_M1=ON -DUSE_VULKAN=ON -DUSE_BOOST_BACKTRACE=OFF -DUSE_HDF5=OFF \
-DBUILD_TESTS=ON -DBUILD_SPDLOG=ON
@beniz Apple's Clang/LLVM does support OpenMP, and it shows good scaling on the M1. Apple does not provide the OpenMP library so you must either
When compiling with Clang/LLVM remember to add -Xclang -fopenmp to CPPFLAGS, add -lomp to LIBS so a minimal compile might be
g++ -o hello -Xclang -fopenmp hellomp.c -lomp
Ah thanks @neurolabusc I had stumbled onto https://github.com/Tencent/ncnn/blob/54c0a13b9fe062141ae8d10b49a7bda0829a012a/.github/workflows/release.yml#L165 that includes building openmp as a step for NCNN Mac OS builds.
I will try when I have a moment, and have updated the list of bullet points accordingly. In all cases this should close the gap between CPU and GPU on the M1 with batch size 1. Actually, larger batch sizes with NCNN rely on OpenMP AFAIK and thus may not benefit from true parallelism at GPU level.
This pull request is now in conflict :(
This pull request is now in conflict :(
This pull request is now in conflict :(
@mergifyio update
Command update: success
Branch has been successfully updated
This pull request is now in conflict :(
This pull request is now in conflict :(
This pull request is now in conflict :(
This pull request is now in conflict :(
update
❌ Base branch update has failed
merge conflict between base and head err-code: 35C31
This pull request is now in conflict :(
brew install pkg-config opencv@4 libomp
# for pytorch
python3 -m pip install typing_extensions pyyaml
This pull request is now in conflict :(
This pull request is now in conflict :(
This pull request is now in conflict :(
This pull request is now in conflict :(