mediapipe icon indicating copy to clipboard operation
mediapipe copied to clipboard

.tflite model is slower when run in mediapipe graph

Open orsveri opened this issue 3 years ago • 3 comments

Why the mediapipe inference calculator has much more longer inference time than simple running of .tflite model?

System information (Please provide as much relevant information as possible)

  • Have I written custom code (as opposed to using a stock example script provided in Mediapipe): No
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04, Android 11, iOS 14.4): Linux Ubuntu 20.04 on Raspberry Pi 4B
  • MediaPipe version: Current master branch
  • Bazel version: 3.7.2
  • Solution (e.g. FaceMesh, Pose, Holistic): Hands

Describe the expected behavior:

I am profiling the solution. I use TFLite benchmark tool to measure .tflite models performance, and Mediapipe profiler to measure performance of specific nodes in the mediapipe graph. I built the solution with xnn support enabled.

I expect that the average inference time of .tflite model and the corresponding inference calculator node would be approximately the same.

Standalone code you may have used to try to get what you need :

  1. Build the solution with the following command: bazel build -c opt --define MEDIAPIPE_DISABLE_GPU=1 --define tflite_with_xnnpack=true mediapipe/examples/desktop/hand_tracking:hand_tracking_cpu
  2. In the begginning of the file <mediapipe_root_dir>/mediapipe/graphs/hand_tracking/hand_tracking_desktop_live.pbtxt insert the following code:
profiler_config {
  trace_enabled: true
  enable_profiler: true
  trace_log_interval_count: 200
  trace_log_path: "<target_log_directory>"
}
  1. Run the solution (input_video_path and output_video_path parameters are optional): GLOG_logtostderr=1 bazel-bin/mediapipe/examples/desktop/hand_tracking/hand_tracking_cpu --calculator_graph_config_file=mediapipe/graphs/hand_tracking/hand_tracking_desktop_live.pbtxt --input_video_path=<input_video_file_path> --output_video_path=<output_video_file_path
  2. Get the .binarypb file and upload it to visualizer, sort by avg time and find the slowest nodes - they will be inference calculators for the palm_detection and hand_landmarks models.
  3. Download TFLite benchmark tool from here. Measure performance of the models with the following commands:
  • <path_to_tflite_benchmark_binary_file> --graph=<mediapipe_root_dir>/mediapipe/modules/palm_detection/palm_detection_full.tflite --num_threads=3 --use_xnnpack=true --num_runs=2000
  • <path_to_tflite_benchmark_binary_file> --graph=<mediapipe_root_dir>/mediapipe/modules/hand_landmark/hand_landmark_full.tflite --num_threads=3 --use_xnnpack=true --num_runs=2000

(Look for a line in the output that looks like: Inference timings in us: Init: 80256, First inference: 62178, Warmup (avg): 60112.9, Inference (avg): 60287.1)

  1. Compare the models inference time.
  2. ???

Other info / Complete Logs :

My average inference time results:

Model tflite benchmark mediapipe profiler
palm_detection 75 ms 133 ms
hand_landmark 60 ms 110 ms

My mediapipe profiler log file: link to download

My tflite benchmark tool result:

  • palm_detection_full.tflite model
STARTING!
Log parameter values verbosely: [0]
Min num runs: [2000]
Num threads: [3]
Graph: [/home/bg/tmp/mediapipe/mediapipe/modules/palm_detection/palm_detection_full.tflite]
#threads used for CPU inference: [3]
Use xnnpack: [1]
Loaded model /home/bg/tmp/mediapipe/mediapipe/modules/palm_detection/palm_detection_full.tflite
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
XNNPACK delegate created.
Explicitly applied XNNPACK delegate, and the model graph will be partially executed by the delegate w/ 1 delegate kernels.
The input model file size (MB): 2.34128
Initialized session in 112.979ms.
Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
count=7 first=92201 curr=83114 min=73107 max=92201 avg=79946 std=6468

Running benchmark for at least 2000 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
count=1986 first=73574 curr=67474 min=65677 max=237263 avg=75151.7 std=16578

Inference timings in us: Init: 112979, First inference: 92201, Warmup (avg): 79946, Inference (avg): 75151.7
Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
Memory footprint delta from the start of the tool (MB): init=15.543 overall=42.1562
  • hand_landmark_full.tflite model
STARTING!
Log parameter values verbosely: [0]
Min num runs: [2000]
Num threads: [3]
Graph: [/home/bg/tmp/mediapipe/mediapipe/modules/hand_landmark/hand_landmark_full.tflite]
#threads used for CPU inference: [3]
Use xnnpack: [1]
Loaded model /home/bg/tmp/mediapipe/mediapipe/modules/hand_landmark/hand_landmark_full.tflite
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
XNNPACK delegate created.
Explicitly applied XNNPACK delegate, and the model graph will be completely executed by the delegate.
The input model file size (MB): 5.47869
Initialized session in 80.256ms.
Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
count=9 first=62178 curr=58597 min=58450 max=62720 avg=60112.9 std=1474

Running benchmark for at least 2000 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
count=2000 first=192849 curr=58375 min=50373 max=253885 avg=60287.1 std=15568

Inference timings in us: Init: 80256, First inference: 62178, Warmup (avg): 60112.9, Inference (avg): 60287.1
Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
Memory footprint delta from the start of the tool (MB): init=30.5781 overall=37.4414

orsveri avatar Mar 10 '22 08:03 orsveri

@orsveri ,are you using same data for both the models?

sureshdagooglecom avatar Mar 24 '22 09:03 sureshdagooglecom

@sureshdagooglecom , yes. I am using a short videofile. I uploaded it here, if you want to try.

orsveri avatar Mar 24 '22 19:03 orsveri

I have the same issue on android. I ported movnet thunder, and the inference time is doubled compared to official demo app.

SunXuan90 avatar Jul 01 '22 01:07 SunXuan90

Hello @orsveri, We are upgrading the MediaPipe Legacy Solutions to new MediaPipe solutions However, the libraries, documentation, and source code for all the MediapPipe Legacy Solutions will continue to be available in our GitHub repository and through library distribution services, such as Maven and NPM.

You can continue to use those legacy solutions in your applications if you choose. Though, we would request you to check new MediaPipe solutions which can help you more easily build and customize ML solutions for your applications. These new solutions will provide a superset of capabilities available in the legacy solutions.

kuaashish avatar Apr 25 '23 12:04 kuaashish

This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.

github-actions[bot] avatar May 03 '23 01:05 github-actions[bot]

This issue was closed due to lack of activity after being marked stale for past 7 days.

github-actions[bot] avatar May 10 '23 01:05 github-actions[bot]

Are you satisfied with the resolution of your issue? Yes No

google-ml-butler[bot] avatar May 10 '23 01:05 google-ml-butler[bot]