ysh329 comments

Results 161 comments of


                                            ysh329

【论文解读】CLTune: A Generic Auto-Tuner for OpenCL Kernels

## 6. 搜索策略的经验两种启发式算法：模拟退火和粒子群优化，都有其各自的特点，不同的问题哪一种更合适需要尝试的。 ![image](https://user-images.githubusercontent.com/7320657/108442838-b2790780-7292-11eb-81e5-0d5b09cd30b3.png) **表：作者实验调优的硬件** 通过作者的尝试，也发现一些经验： 1. 当用户自定义卷积核比较小时，可以将其放到OpenCL constant mem中； 2. 在2D卷积实验中，对完整搜索空间的搜索结果的性能分布上观察，只有极少的设置下性能很好。我的理解是，参数间的强相关，整个搜索空间的较好性能情况还是非常稀疏的； 3. 在2D卷积实验中，模拟退火和粒子群在某些硬件上表现好，但有些反之，应该是落入到了局部最优后续也出不来了； 4. 在矩阵乘法实验中，最佳的7类参数在下标中，可以看出不同的设备上基本都是不同的。其实类似的实验经验还有一些，但是都是设备相关的，不具有普适性。总的来说，CLTune提供了在OpenCL Kernel上为每一个硬件设备、以模板化方法实现来调优的思路，将异构计算的通用性思维发扬光大。 > 但其实手写常用算子+tuning的成本确实不高，但是长远来看，长尾算子、算子融合这些，实现成本就太高了。还是需要将tune策略与codegen结合起来的。 ![image](https://user-images.githubusercontent.com/7320657/108442869-befd6000-7292-11eb-85a7-ae617fd00338.png) ![image](https://user-images.githubusercontent.com/7320657/108442881-c6246e00-7292-11eb-949c-9f9f208f8867.png) ![image](https://user-images.githubusercontent.com/7320657/108442901-cfadd600-7292-11eb-9ca8-6aebf88246d7.png)

Even Faster CNNs Exploring the New Class of Winograd Algorithms

![image](https://user-images.githubusercontent.com/7320657/101725378-83545600-3aeb-11eb-81da-d6561abfe293.png)

Even Faster CNNs Exploring the New Class of Winograd Algorithms

![image](https://user-images.githubusercontent.com/7320657/101725449-a8e15f80-3aeb-11eb-97c6-b3203b131cda.png)

How to adjust and query AMD GPU clock frequency?

# aticonfig I tried PowerXpress options but result is disappointing. ```shell gpu@gpu-FP4:~/yuanshuai/code/CLBlast/build$ sudo aticonfig --px-list-active-gpu PowerXpress: Discrete GPU is active (High-Performance mode). gpu@gpu-FP4:~/yuanshuai/code/CLBlast/build$ sudo aticonfig --pxl PowerXpress: Discrete GPU is...

How to adjust and query AMD GPU clock frequency?

Besides, I found a tool named AGT from this link: [Manage your GPU HW · amd/OpenCL-caffe Wiki](https://github.com/amd/OpenCL-caffe/wiki/Manage-your-GPU-HW). However, it seems a window tool ([Download AMD GPU Clock Tool | TechPowerUp](https://www.techpowerup.com/download/amd-gpu-clock-tool/)...

common Error Q&A

## Adreno GPU SDK - FAQs - Qualcomm Developer Network https://developer.qualcomm.com/software/adreno-gpu-sdk/faq ### What is included in the Adreno SDK for OpenCL? This SDK includes usage examples for Qualcomm Technologies extensions...

common Error Q&A

OpenCL Tips · yszheda/wiki Wiki https://github.com/yszheda/wiki/wiki/OpenCL-Tips

common Error Q&A

Sub-optimal performance on Qualcomm Adreno GPUs · Issue #228 · CNugteren/CLBlast https://github.com/CNugteren/CLBlast/issues/228

common Error Q&A

Float16 GEMM on Adreno 330 · Issue #181 · CNugteren/CLBlast https://github.com/CNugteren/CLBlast/issues/181 do not have a certain result of float16

common Error Q&A

local work size和work group size > ## Opencl global work size vs local work size > In both cases the global size is 1024. In case 1, the local size...