mace icon indicating copy to clipboard operation
mace copied to clipboard

[Performance] clBuildProgram taking too long

Open sumant85 opened this issue 5 years ago • 3 comments

Describe the problem

We have a mobile application and our use case is that we want the absolute first run of the model to be as fast as possible (since if affects user engagement). After profiling, it looks like during the initial run, a significant amount of time is spent in kernel compilation (clBuildProgram specifically). This can go up to as high as 7 seconds for some of our models.

I did read about https://mace.readthedocs.io/en/latest/user_guide/advanced_usage.html#tuning-for-specific-soc-s-gpu so it's possible to ship pre-compiled kernels per SOC, but given our mobile app use case, we don't know before hand which SOC the model would run on.

I specifically wanted to ask if the MACE team has experimented with replacing https://github.com/XiaoMi/mace/blob/master/mace/core/runtime/opencl/opencl_runtime.cc#L480 with the async version (https://github.com/ARM-software/ComputeLibrary/blob/master/include/CL/cl2.hpp#L6293) which invokes a callback once compilation succeeds, and whether we can expect any improvements to model loading time by following this approach?

sumant85 avatar Jun 15 '19 01:06 sumant85

@sumant85 You can execute the initial run in an sub thread, which is same as the async compilation on performance?

lu229 avatar Jun 17 '19 01:06 lu229

@lu229 Thanks for the response! My question was more around enqueuing multiple compilation jobs in parallel (as opposed to waiting for each compilation to finish before launching the next one). From the opencl documentation, it seems like some drivers might be able to run these compilations in parallel, thus reducing overall compilation time.

A similar question was asked https://community.amd.com/thread/157981 a long time ago, but I am not sure whether the current opencl drivers support this functionality (or if the MACE team has tried this approach). Thanks!

sumant85 avatar Jun 17 '19 02:06 sumant85

@sumant85 Thanks for your reply, I see what you mean, I discussed it with @nolanliou and we think it make scense, I will try the compilations in parallel, if the performace can be improved, we will integrate it into mace.

lu229 avatar Jun 17 '19 04:06 lu229