ck-caffe icon indicating copy to clipboard operation
ck-caffe copied to clipboard

Performance of lib-caffe-bvlc-clblast-master-gcc-5.4.0-linux-32 on Odroid XU4

Open dohai90 opened this issue 8 years ago • 9 comments

Hello,

I have 2 Odroid XU4 boards and I have both installed lib-caffe-bvlc-clblast-master-gcc-5.4.0-linux-32 library, but the first one was installed old version and the second one was installed recent version. And I found that the old one has better performance compared to the new one as below: The new one: image

The old one: image

Could you please tell me why this issue occurred?

Best regards

dohai90 avatar Dec 01 '17 11:12 dohai90

@dohai90 There could be many reasons for the regression but let's start from the most obvious ones:

  1. CLBlast has changed. When did you install your first (old) version?
  2. Is the result repeatable? How do you run the program? If you use ck run program:..., you do not fix the CPU and GPU frequency, so they may change during execution. Using ck benchmark program:... is more reliable.

psyhtest avatar Dec 01 '17 23:12 psyhtest

Also, after using ck benchmark program (it's a more high-level "pipeline" which attempts to set up and monitor CPU/GPU frequency, etc)) in both cases, please provide the log. It will help us see the resolved dependencies and their versions ... Thanks!

gfursin avatar Dec 02 '17 09:12 gfursin

Hello, @psyhtest

  1. I installed old version 2 or 3 months ago.
  2. First, I used ck run program:caffe and get the above results. After using ck benchmark program:caffe they give similar performance.

@gfursin I attached here the log of both cases. As I understand, the benchmark utility sets max frequency for both CPU and GPU, am I right? Although the results from both cases are similar by using benchmark utility, I run the same program on 2 boards but the execution times are still different while I have set max frequency for both CPU and GPU via command: ./CK/ck-env/platform.init/generic-odroid/ck-set-cpu-performance and: ./CK/ck-env/platform.init/generic-odroid/ck-set-gpu-performance new_device_log.txt old_device_log.txt

Could you give me any advice why the same program runs on 2 boards which have been set max frequency but still results in different execution time? If you need I will upload my program here.

Thank you

dohai90 avatar Dec 04 '17 04:12 dohai90

If I interpret the logs correctly, the execution time is about 67 seconds in both cases? Am I missing something?

On Mon, 4 Dec 2017 at 04:19, Trunghai Do [email protected] wrote:

Hello, @psyhtest https://github.com/psyhtest

  1. I installed old version 2 or 3 months ago.
  2. First, I used ck run program:caffe and get the above results. After using ck benchmark program:caffe they give similar performance.

@gfursin https://github.com/gfursin I attached here the log of both cases. As I understand, the benchmark utility sets max frequency for both CPU and GPU, am I right? Although the results from both cases are similar by using benchmark utility, I run the same program on 2 boards but the execution times are still different while I have set max frequency for both CPU and GPU via command: ./CK/ck-env/platform.init/generic-odroid/ck-set-cpu-performance and: ./CK/ck-env/platform.init/generic-odroid/ck-set-gpu-performance new_device_log.txt https://github.com/dividiti/ck-caffe/files/1525640/new_device_log.txt old_device_log.txt https://github.com/dividiti/ck-caffe/files/1525641/old_device_log.txt

Could you give me any advice why the same program runs on 2 boards which have been set max frequency but still results in different execution time? If you need I will upload my program here.

Thank you

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/dividiti/ck-caffe/issues/123#issuecomment-348857008, or mute the thread https://github.com/notifications/unsubscribe-auth/AGSsuux5i9Sv-TB5S74q_XXE2-KU_UhAks5s83K7gaJpZM4QyMDw .

psyhtest avatar Dec 04 '17 07:12 psyhtest

@psyhtest Yes, you are right, but that is the benchmark program. However when I run the same network architecture using lib-caffe-bvlc-clblast-master-gcc-5.4.0-linux-32 framework, the new one is slower even though I have set max frequency for CPU and GPU on both boards, it's really strange.

dohai90 avatar Dec 04 '17 07:12 dohai90

Also, from your logs it seems that you use different versions of CK (or at least repos). Can you please update by ck pull all. I remember fixing some problems with scripts when you had to change to the directory with scripts before running them, otherwise they wouldn’t run properly...

If you still have issues, please run ck-print-cpu-freq (ck-print-gpu-freq) after ck-set-cpu-performance (ck-set-gpu-performance) to check that the CPU (GPU) frequency has been set correctly.

On Mon, 4 Dec 2017 at 07:37, Anton Lokhmotov [email protected] wrote:

If I interpret the logs correctly, the execution time is about 67 seconds in both cases? Am I missing something?

On Mon, 4 Dec 2017 at 04:19, Trunghai Do [email protected] wrote:

Hello, @psyhtest https://github.com/psyhtest

  1. I installed old version 2 or 3 months ago.
  2. First, I used ck run program:caffe and get the above results. After using ck benchmark program:caffe they give similar performance.

@gfursin https://github.com/gfursin I attached here the log of both cases. As I understand, the benchmark utility sets max frequency for both CPU and GPU, am I right? Although the results from both cases are similar by using benchmark utility, I run the same program on 2 boards but the execution times are still different while I have set max frequency for both CPU and GPU via command: ./CK/ck-env/platform.init/generic-odroid/ck-set-cpu-performance and: ./CK/ck-env/platform.init/generic-odroid/ck-set-gpu-performance new_device_log.txt https://github.com/dividiti/ck-caffe/files/1525640/new_device_log.txt old_device_log.txt https://github.com/dividiti/ck-caffe/files/1525641/old_device_log.txt

Could you give me any advice why the same program runs on 2 boards which have been set max frequency but still results in different execution time? If you need I will upload my program here.

Thank you

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/dividiti/ck-caffe/issues/123#issuecomment-348857008, or mute the thread https://github.com/notifications/unsubscribe-auth/AGSsuux5i9Sv-TB5S74q_XXE2-KU_UhAks5s83K7gaJpZM4QyMDw .

psyhtest avatar Dec 04 '17 09:12 psyhtest

See a potential issue below.

psyhtest avatar Dec 04 '17 09:12 psyhtest

BTW, the new CLBlast may have regressions on older architectures (though there are plans to make it more adaptive). To check it, you may want to use dvdt profiler to profile OpenCL kernels in Caffe:

$ ck benchmark program:... --dvdt_prof

Also, if you notice some errors in the platform scripts, please feel free to update them and provide a patch. The main idea behind CK is to collaboratively understand regressions/reproducibility issues and solve them ... Thanks!!!

gfursin avatar Dec 05 '17 15:12 gfursin

In the new device log, I see:

Setting GPU frequency to max (if supported) ...

CMD to set GPU frequency:
  export CK_CPU_FREQUENCY=max;/home/odroid/CK/ck-env/platform.init/generic-odroid/ck-set-gpu-performance

I would expect to see export CK_GPU_FREQUENCY here...

psyhtest avatar Dec 07 '17 14:12 psyhtest