tflite-micro icon indicating copy to clipboard operation
tflite-micro copied to clipboard

Profiler improvements

Open andresovela opened this issue 1 year ago • 8 comments

This PR adds a few improvements to the MicroProfiler class.

bug=#2725

First, I made it a template so that it's possible to adjust the maximum number of events the profiler can handle via a template parameter.

The motivation for this change is to save memory. #1835 increased the maximum number of events from 1024 to 4096, which is a huge waste of RAM. The MicroProfiler class has these buffers:

const char* tags_[kMaxEvents];
uint32_t start_ticks_[kMaxEvents];
uint32_t end_ticks_[kMaxEvents];

struct TicksPerTag {
  const char* tag;
  uint32_t ticks;
};

TicksPerTag total_ticks_per_tag[kMaxEvents];

We need 20 bytes per event, so an instance of MicroProfiler uses 80 KB of RAM, even if the model you're profiling has only a few nodes.

This PR also adds the following enum to specify the logging format and remove some duplicated code and clean the API.

enum class MicroProfilerLogFormat {
  HumanReadable,
  Csv,
};

I replaced LogTicksPerTagCsv() with LogGrouped(), which does essentially the same thing, but also prints in a human readable format which looks like this:

Cumulative event times:
Count    Tag                              Ticks        Time
6        DEPTHWISE_CONV_2D                3085702      5.194 ms
2        SUM                              1281206      2.156 ms
66       FULLY_CONNECTED                  934142       1.572 ms
10       CONV_2D                          765357       1.288 ms
67       QUANTIZE                         731923       1.232 ms
20       CONCATENATION                    362212       0.609 ms
76       STRIDED_SLICE                    335828       0.565 ms
3        RELU                             245643       0.413 ms
4        DEQUANTIZE                       156682       0.263 ms
6        PAD                              145834       0.245 ms
29       ADD                              130229       0.219 ms
65       RESHAPE                          49593        0.083 ms
13       MUL                              48852        0.082 ms
4        SUB                              16371        0.027 ms
6        LOGISTIC                         10336        0.017 ms
3        TANH                             4761         0.008 ms
3        UNPACK                           2738         0.004 ms

Total time: 13.985 ms (8307409 ticks)

Finally I also added TicksToUs() to micro_time.h for convenience.

andresovela avatar Oct 18 '24 13:10 andresovela

@andresovela

Thanks for your PR submission (and all the spelling corrections!).

This will require some discussion with @suleshahid. I like the template change for sizing the timing data.

ddavis-2015 avatar Oct 19 '24 22:10 ddavis-2015

Oops, closed by mistake.

andresovela avatar Nov 04 '24 12:11 andresovela

"This PR is being marked as stale due to inactivity. Remove label or comment to prevent closure in 5 days."

github-actions[bot] avatar Dec 15 '24 10:12 github-actions[bot]

Waiting for review

andresovela avatar Dec 15 '24 11:12 andresovela

"This PR is being marked as stale due to inactivity. Remove label or comment to prevent closure in 5 days."

github-actions[bot] avatar Jan 25 '25 10:01 github-actions[bot]

Waiting for review

andresovela avatar Jan 25 '25 23:01 andresovela

@andresovela Thank you for your interest in TFLM (RTLM) and for submitting this PR! Also, thank you for your persistence and perseverance.

Please re-merge with the main branch, so there are no conflicts.

While I think this is a good change to the profiler, it is still in internal discussion.

ddavis-2015 avatar Mar 04 '25 11:03 ddavis-2015

Updated

andresovela avatar Mar 04 '25 13:03 andresovela