blueoil icon indicating copy to clipboard operation
blueoil copied to clipboard

Time measurement support on build. (#319)

Open ed728 opened this issue 5 years ago • 25 comments

Motivation and Context

Debug options should not be default, so I made the time measurement feature a togglable option for the blueoil.sh script. For running make inside of the project, you would have to manually append the CXXFLAGS, so make lm_fpga CXXFLAG="-DFUNC_TIME_MEASUREMENT".

This will be followed by proper user documentation for #319.

Description

-Extra optional option added to blueoil.sh. -Removed timing as a default for lm_fpga target. -CXXFLAGS now specified via make argument, rather than environment. -Timing is now only done when specified, not always by default. Debug options should not be default...

How has this been tested?

Jenkins.

Screenshots (if appropriate):

Types of changes

  • [ ] Bug fix (non-breaking change which fixes an issue)
  • [ ] New feature / Optimization (non-breaking change which adds functionality)
  • [x] Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

  • [ ] My change requires a change to the documentation.
  • [ ] I have updated the documentation accordingly.

ed728 avatar Jun 19 '19 11:06 ed728

I am not too confident about what the automated Jenkins tests cover, so someone please poke me in case it is not sufficient. I did run make test-dlk locally, and no errors popped in the beginning, but I am not sure whether it uses these scripts or not...

Also, as far as I am aware we have no tests in CI which rely on the time measurement feature, so nothing should break.

ed728 avatar Jun 19 '19 11:06 ed728

@iizukak Any suggestions for reviewers?

ed728 avatar Jun 19 '19 11:06 ed728

Also @n-nez, since I expect you guys would be ever so slightly affected by the Makefile change.

ed728 avatar Jun 19 '19 11:06 ed728

@ed728 make test-dlk uses CMake instead of raw Makefile...

primenumber avatar Jun 19 '19 11:06 primenumber

run blueoil test

iizukak avatar Jun 20 '19 00:06 iizukak

@ed728 tca_support is already merged. Please change base branch to master .

iizukak avatar Jun 20 '19 00:06 iizukak

@primenumber Yeah, Jenkins does cover the normal use case though.

ed728 avatar Jun 20 '19 00:06 ed728

-Timing is now only done when specified, not always by default. Debug options should not be the default...

Actually, the only uses of lm_fpga are

  • testing
  • time measurement

So keeping time measurement enabled by default for the target sounds sensible for me actually. Benefits of switching it off, on the other hand, is not clear. :thinking:

n-nez avatar Jun 20 '19 02:06 n-nez

Hmmm, I mean this stems from #319 which wants documentation for time measurements. And when looking through our documentation, for people wanting to run on FPGA it suggests running blueoil.sh which eventually invokes make lm_fpga... So any customer or external person who would like to use blueoil with an FPGA would go through that build as I see it.

Also, even for testing and time measuring, being able to disable internal debugs for "external"/"real" time measurements from OS is useful I think (i.e. timing how ./lm_fpga.elf ... takes as a whole with not other internal debug options enabled).

Oooor, I am missing something important about how customers are supposed to use this.

Ref.: https://docs.blue-oil.org/usage/convert.html

ed728 avatar Jun 20 '19 04:06 ed728

If "extenal" time measurements imply measurement of the execution time of whole binary itself, they do not make much sense to me, as there are lots of "preparation steps" like loading npy files and Network::init(), so measurements will be rather misleading.

Customers aren't supposed to use lm_fpga.elf at all, it's purely diagnostic/testing tool, to check that dlk conversion works as expected.

Customers supposed to use lib_fpga.so in their applications, for which TIME_MEASUREMENT is disabled, so they could do whatever kind of measurements which suits them if required.

n-nez avatar Jun 20 '19 05:06 n-nez

As for #319 documentation should say how to

  1. obtain npys from training checkpoint
  2. convert pb file
  3. do make lm_fpga
  4. run lm_fpga.elf with the npys and read the report

any mechanism to enable disable measurements in lm_fpga.elf in this case just increases the number of unnecessary work to get the numbers.

n-nez avatar Jun 20 '19 05:06 n-nez

It does make sense to measure all loadings and inits for a program that is meant to be used by customers, I think, but that is a different issue.

And I see....but in that case one should be able to enable it for the lm_fpga.so file I think.

Also, the lm_*.elf targets should either all include the measurement flag in the Makefile, or none of them should, I would say.

ed728 avatar Jun 20 '19 05:06 ed728

It does make sense to measure all loadings and inits for a program that is meant to be used by customers

Ok, this is a test program, for developers, customers could run it too if they want, but this is all.

Also, the lm_*.elf targets should either all include the measurement flag in the Makefile, or none of them should, I would say.

All of them should include it, I believe, but so far lm_fpga.elf was the only one target performance of which were relevant to us. I would be very appreciated if you add time measurement flag to all other elf targets we have in this PR, instead of removing it from lm_fpga.elf.

For lm_fpga.so I guess it is doable to enable time measurement for them as well, but the current way we measure time is not suitable for measurements across long runs.

That is it adds a new time interval on each function call into an internal data structure.

  • If you run inference a few times it will provide you with useful statistics, so you can see how much time the function takes to do work on each layer.
  • If you run inference million times which supposed to happen when you use lm_fpga.so the structure will grow unlimited and you will run out of memory at the end.
    • thus using a normal profiler seems to be a more sensible approach in this case.

n-nez avatar Jun 20 '19 06:06 n-nez

@ed728 Thanks 👍 lm_*.elf are only for test and debug. It is no need flag of enable or disable time measurement of these lm_*.elf, It always should measure time 🤔

ruimashita avatar Jun 20 '19 08:06 ruimashita

https://github.com/blue-oil/blueoil/pull/330#issuecomment-503877782 This @n-nez comment is what I want to do in #319 . Sorry for confusion.

iizukak avatar Jun 21 '19 00:06 iizukak

I will change this to enable it on all lm_*.elf.

On a side note, do we measure the time of things like Network::init() separately?

I still believe it should then be available for lm_*.so as a flag. I mean if it is disabled by default, no one will get an overflow just because of ignorance. And people who do know how to use it, may be able to get some nice data out of it (if we add some reasonable documentation to it). Also, for this overflow issue, should we make an issue and fix it (eventually)?

ed728 avatar Jun 21 '19 05:06 ed728

On a side note, do we measure the time of things like Network::init() separately?

Yes we do. You will see both init time and run time in the report. :smirk:

Also, for this overflow issue, should we make an issue and fix it (eventually)?

I believe priority of fixing it is quite low at the moment, but adding an issue definitely doesn't hurt.

n-nez avatar Jun 22 '19 06:06 n-nez

@ed728 Can you solve conflict?

iizukak avatar Jul 10 '19 00:07 iizukak

@iizukak Yes! Will do that as soon as possible.

ed728 avatar Jul 18 '19 05:07 ed728

@iizukak Fixed, I think. Waiting for tests.

ed728 avatar Aug 15 '19 02:08 ed728

@iizukak Any chance this can get approved?

ed728 avatar Sep 12 '19 21:09 ed728

@ed728 Can you answer @ruimashita 's comment?

iizukak avatar Sep 13 '19 00:09 iizukak

@ed728 Please check my understand.

You added --timing option for lib_* . And always add TIME_MEASUREMENT option for lm_*. Because lm_* is for debug only and lib_* is not debug only, correct?

iizukak avatar Sep 13 '19 00:09 iizukak

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

CLAassistant avatar Jun 12 '20 06:06 CLAassistant

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

CLAassistant avatar Jun 12 '20 06:06 CLAassistant