Results 22 comments of Liangliang He

似乎无解 https://github.com/youfou/wxpy/issues/321

For deep learning inference on mobile devices with GPU/OpenCL support, you can checkout [MACE](https://github.com/xiaomi/mace), which supports Adreno, Mali and PowerVR GPUs. Here are some [benchmark results](https://github.com/XiaoMi/mace/issues/1).

cl::CommandQueue is reference counted pointer, and here is the source: https://github.com/KhronosGroup/OpenCL-CLHPP/blob/master/input_cl2.hpp#L1540 https://github.com/KhronosGroup/OpenCL-CLHPP/blob/master/input_cl2.hpp#L1783 https://github.com/KhronosGroup/OpenCL-CLHPP/blob/master/input_cl2.hpp#L6632 We have encountered similar crash inside libOpenCL and finally it turned out it's caused by memory corruption...

@sumant85 This is by design. MACE is a fundamental library instead of a user facing application level code, so the design is focused on correctness and we adopt fail-fast instead...

@sumant85 Thanks for pointing out. It makes sense. These OpenCL platform related code should be handled as runtime errors instead of assert, we should fix them. It's will be great...

The daily benchmark results is available here: * https://gitlab.com/llhe/mace-models/pipelines * 2018/06/29 https://gitlab.com/llhe/mace-models/-/jobs/78152526

@DiamonJoy The benchmark is actually CI result of [MACE Model Zoo](https://github.com/XiaoMi/mace-models) project. Util now, our efforts is mainly focused on float data type and CPU/GPU runtime, and have not enough...

**Tuned** means the OpenCL kernel is tuned for the specific type of device instead of using the general rule.

@robertwgh In our original use case, we deploy each model against a specific device (usually a new product), so we wish it's be ultimately optimized by brute forcely search against...