ArmNeonOptimization
ArmNeonOptimization copied to clipboard
Arm neon optimization practice
求问为啥后面的neon指令看上去并没有加速很多?这里面的原因是为啥?
请问大佬, 有办法对进行arm neon代码进行单步调试吗?或者开发的时候 如何去debug,arm neon汇编或者intrinsic
My run produces (left: input; middle: my implementation; right: this implementation; radius=10): Can you try this input? It's a 201x201 FP32 square, stored with little-endian: [noisy-square.tar.gz](https://github.com/Ldpe2G/ArmNeonOptimization/files/3842605/noisy-square.tar.gz) I use this routine...
Radius 5 In 3.4Ghz Cpu i use avx2 to improve add histogram and sub histogram but the time speed greater than 200ms image size 1280*1024
CMake Error at CMakeLists.txt:35 (add_subdirectory): add_subdirectory given source "third_party/googletest/googletest" which is not an existing directory.