blueoil
blueoil copied to clipboard
Time measurement support on build. (#319)
Motivation and Context
Debug options should not be default, so I made the time measurement feature a togglable option for the blueoil.sh script. For running make
inside of the project, you would have to manually append the CXXFLAGS, so make lm_fpga CXXFLAG="-DFUNC_TIME_MEASUREMENT"
.
This will be followed by proper user documentation for #319.
Description
-Extra optional option added to blueoil.sh. -Removed timing as a default for lm_fpga target. -CXXFLAGS now specified via make argument, rather than environment. -Timing is now only done when specified, not always by default. Debug options should not be default...
How has this been tested?
Jenkins.
Screenshots (if appropriate):
Types of changes
- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature / Optimization (non-breaking change which adds functionality)
- [x] Breaking change (fix or feature that would cause existing functionality to not work as expected)
Checklist:
- [ ] My change requires a change to the documentation.
- [ ] I have updated the documentation accordingly.
I am not too confident about what the automated Jenkins tests cover, so someone please poke me in case it is not sufficient. I did run make test-dlk
locally, and no errors popped in the beginning, but I am not sure whether it uses these scripts or not...
Also, as far as I am aware we have no tests in CI which rely on the time measurement feature, so nothing should break.
@iizukak Any suggestions for reviewers?
Also @n-nez, since I expect you guys would be ever so slightly affected by the Makefile change.
@ed728 make test-dlk
uses CMake instead of raw Makefile...
run blueoil test
@ed728 tca_support
is already merged. Please change base branch to master
.
@primenumber Yeah, Jenkins does cover the normal use case though.
-Timing is now only done when specified, not always by default. Debug options should not be the default...
Actually, the only uses of lm_fpga
are
- testing
- time measurement
So keeping time measurement enabled by default for the target sounds sensible for me actually. Benefits of switching it off, on the other hand, is not clear. :thinking:
Hmmm, I mean this stems from #319 which wants documentation for time measurements. And when looking through our documentation, for people wanting to run on FPGA it suggests running blueoil.sh
which eventually invokes make lm_fpga
... So any customer or external person who would like to use blueoil with an FPGA would go through that build as I see it.
Also, even for testing and time measuring, being able to disable internal debugs for "external"/"real" time measurements from OS is useful I think (i.e. timing how ./lm_fpga.elf ...
takes as a whole with not other internal debug options enabled).
Oooor, I am missing something important about how customers are supposed to use this.
Ref.: https://docs.blue-oil.org/usage/convert.html
If "extenal" time measurements imply measurement of the execution time of whole binary itself, they do not make much sense to me, as there are lots of "preparation steps" like loading npy
files and Network::init()
, so measurements will be rather misleading.
Customers aren't supposed to use lm_fpga.elf
at all, it's purely diagnostic/testing tool, to check that dlk conversion works as expected.
Customers supposed to use lib_fpga.so
in their applications, for which TIME_MEASUREMENT
is disabled, so they could do whatever kind of measurements which suits them if required.
As for #319 documentation should say how to
- obtain
npy
s from training checkpoint - convert
pb
file - do
make lm_fpga
- run
lm_fpga.elf
with thenpy
s and read the report
any mechanism to enable disable measurements in lm_fpga.elf
in this case just increases the number of unnecessary work to get the numbers.
It does make sense to measure all loadings and inits for a program that is meant to be used by customers, I think, but that is a different issue.
And I see....but in that case one should be able to enable it for the lm_fpga.so
file I think.
Also, the lm_*.elf
targets should either all include the measurement flag in the Makefile, or none of them should, I would say.
It does make sense to measure all loadings and inits for a program that is meant to be used by customers
Ok, this is a test program, for developers, customers could run it too if they want, but this is all.
Also, the lm_*.elf targets should either all include the measurement flag in the Makefile, or none of them should, I would say.
All of them should include it, I believe, but so far lm_fpga.elf
was the only one target performance of which were relevant to us. I would be very appreciated if you add time measurement flag to all other elf
targets we have in this PR, instead of removing it from lm_fpga.elf
.
For lm_fpga.so
I guess it is doable to enable time measurement for them as well, but the current way we measure time is not suitable for measurements across long runs.
That is it adds a new time interval on each function call into an internal data structure.
- If you run inference a few times it will provide you with useful statistics, so you can see how much time the function takes to do work on each layer.
- If you run inference million times which supposed to happen when you use
lm_fpga.so
the structure will grow unlimited and you will run out of memory at the end.- thus using a normal profiler seems to be a more sensible approach in this case.
@ed728 Thanks 👍
lm_*.elf
are only for test and debug. It is no need flag of enable or disable time measurement of these lm_*.elf
, It always should measure time 🤔
https://github.com/blue-oil/blueoil/pull/330#issuecomment-503877782 This @n-nez comment is what I want to do in #319 . Sorry for confusion.
I will change this to enable it on all lm_*.elf
.
On a side note, do we measure the time of things like Network::init()
separately?
I still believe it should then be available for lm_*.so
as a flag. I mean if it is disabled by default, no one will get an overflow just because of ignorance. And people who do know how to use it, may be able to get some nice data out of it (if we add some reasonable documentation to it).
Also, for this overflow issue, should we make an issue and fix it (eventually)?
On a side note, do we measure the time of things like
Network::init()
separately?
Yes we do. You will see both init time and run time in the report. :smirk:
Also, for this overflow issue, should we make an issue and fix it (eventually)?
I believe priority of fixing it is quite low at the moment, but adding an issue definitely doesn't hurt.
@ed728 Can you solve conflict?
@iizukak Yes! Will do that as soon as possible.
@iizukak Fixed, I think. Waiting for tests.
@iizukak Any chance this can get approved?
@ed728 Can you answer @ruimashita 's comment?
@ed728 Please check my understand.
You added --timing
option for lib_*
. And always add TIME_MEASUREMENT
option for lm_*
. Because lm_*
is for debug only and lib_*
is not debug only, correct?
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.