onnx-mlir icon indicating copy to clipboard operation
onnx-mlir copied to clipboard

Instrument report op name and the node name of op is not matched.

Open hugulas opened this issue 2 years ago • 4 comments

I am trying to use instrumentation to understand the hotspot of the llama model.

As document instrumentation said,

The second column is the name of op. The third column is the node name of op. This is displayed when the op has onnx_node_name attribute.

But from the example in document, I found the node name of op and name of op are not matched. So do the instrumentation report I collected.

Problem

I found name of operators like " /model/Mul_1", "/model/Gather_1", "/model/layers.0/self_attn/Unsqueeze_8" in the report in onnx.Constant. But in fact, in the llama2 v7 7B onnx, op type of /model/Mul_1 is Mul, optype of /model/Gather_1 is Gather, and optype of /model/layers.0/self_attn/Unsqueeze_8 is Unsqueeze.

My instrumentation report:

==PERF-REPORT==, onnx.Constant, /model/norm/Constant_1, before, 0.000001, 0.000275
==PERF-REPORT==, onnx.Constant, /model/norm/Constant_1, after, 0.000000, 0.000275
==PERF-REPORT==, onnx.Constant, /model/norm/Constant_2, before, 0.000000, 0.000275
==PERF-REPORT==, onnx.Constant, /model/norm/Constant_2, after, 0.000001, 0.000276
==PERF-REPORT==, onnx.Constant, /model/ConstantOfShape_2, before, 0.000000, 0.000276
==PERF-REPORT==, onnx.Constant, /model/ConstantOfShape_2, after, 0.000001, 0.000277
==PERF-REPORT==, onnx.Constant, /model/Mul_1, before, 0.000000, 0.000277
==PERF-REPORT==, onnx.Constant, /model/Mul_1, after, 0.000000, 0.000277
==PERF-REPORT==, onnx.Constant, /model/Gather_1, before, 0.000001, 0.000278
==PERF-REPORT==, onnx.Constant, /model/Gather_1, after, 0.000000, 0.000278
==PERF-REPORT==, onnx.Constant, /model/layers.0/self_attn/Unsqueeze_8, before, 0.000000, 0.000278
==PERF-REPORT==, onnx.Constant, /model/layers.0/self_attn/Unsqueeze_8, after, 0.000001, 0.000279
==START-REPORT==
==PERF-REPORT==, onnx.Constant, /model/layers.0/self_attn/Unsqueeze_8, before, 0.000000, 0.000000
==PERF-REPORT==, onnx.Constant, /model/layers.0/self_attn/Unsqueeze_8, after, 0.000001, 0.000001
==PERF-REPORT==, onnx.Constant, /model/layers.0/self_attn/Unsqueeze_9, before, 0.000000, 0.000001
==PERF-REPORT==, onnx.Constant, /model/layers.0/self_attn/Unsqueeze_9, after, 0.000001, 0.000002
==PERF-REPORT==, onnx.Constant, /model/layers.0/self_attn/Unsqueeze_10, before, 0.000000, 0.000002
==PERF-REPORT==, onnx.Constant, /model/layers.0/self_attn/Unsqueeze_10, after, 0.000000, 0.000002
==PERF-REPORT==, onnx.Constant, /model/layers.0/self_attn/Unsqueeze_11, before, 0.000001, 0.000003
==PERF-REPORT==, onnx.Constant, /model/layers.0/self_attn/Unsqueeze_11, after, 0.000000, 0.000003

Steps:

I used following command to get time measurement report for onnx ops.

onnx-mlir -O3 --maccel=NNPA --mcpu=z16 -store-constants-to-file --instrument-stage=Onnx --instrument-ops=onnx.* --InstrumentBeforeOp --InstrumentAfterOp --InstrumentReportTime /external/onnx-models/llama-7b/decoder_model.onnx

Then I added env variable for runtime. ONNX_MLIR_INSTRUMENT_FILE="./llama2.onnx.instr.txt"

hugulas avatar Nov 07 '23 02:11 hugulas

But in fact, in the llama2 v7 7B onnx, op type of /model/Mul_1 is Mul,

Could you check if the inputs of Mul are constants? I am thinking whether Mul was folded into a Constant at compile time so that the op name was carried out from Mul.

tungld avatar Nov 07 '23 08:11 tungld

But in fact, in the llama2 v7 7B onnx, op type of /model/Mul_1 is Mul,

Could you check if the inputs of Mul are constants? I am thinking whether Mul was folded into a Constant at compile time so that the op name was carried out from Mul.

Yes. All three nodes contain constant inputs. If I want to aggregate the elapsed time for each operation type, should I include the elapsed time of "/model/Mul_1" under the "Mul" operation type, the "Constant" operation type, or both?

  • Input of /model/Mul_1 are /model/ConstantOfShape_2_output_0' and '/model/Constant_36_output_0'.
  • Input of /model/Gather_1 is '/model/Shape_1_output_0' and '/model/Constant_1_output_0'
  • Input of /model/layers.0/self_attn/Unsqueeze_8 are /model/layers.0/self_attn/Cast_1_output_0' and '/model/layers.0/self_attn/Constant_26_output_0'

chenqiny avatar Nov 07 '23 10:11 chenqiny

If I want to aggregate the elapsed time for each operation type, should I include the elapsed time of "/model/Mul_1" under the "Mul" operation type, the "Constant" operation type, or both?

Then you should care about operation type only, the op name has no strict relation with the operation type. In particular, we should count only "Mul" operation type.

FYI, we have a python script, namely make-report inside utils folder, to gather information from the performance output and report time for each operation type. For example python utils/make-report.py -r ./llama2.onnx.instr.txt. For information: https://github.com/onnx/onnx-mlir/pull/2451

tungld avatar Nov 07 '23 12:11 tungld

If I want to aggregate the elapsed time for each operation type, should I include the elapsed time of "/model/Mul_1" under the "Mul" operation type, the "Constant" operation type, or both?

Then you should care about operation type only, the op name has no strict relation with the operation type. In particular, we should count only "Mul" operation type.

FYI, we have a python script, namely make-report inside utils folder, to gather information from the performance output and report time for each operation type. For example python utils/make-report.py -r ./llama2.onnx.instr.txt. For information: #2451

Thank you. I found the make-report and compile log are really useful. It seems --profile-ir already provided instrumentation functions instead of using instrumentation options.

chenqiny avatar Nov 07 '23 15:11 chenqiny