mediapipe icon indicating copy to clipboard operation
mediapipe copied to clipboard

Initial holistic tracking activation is getting significantly slower

Open jianinz opened this issue 4 years ago • 7 comments

Please make sure that this is a solution issue.

System information (Please provide as much relevant information as possible)

  • Have I written custom code (as opposed to using a stock example script provided in Mediapipe): Yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04, Android 11, iOS 14.4): Android 11
  • MediaPipe version: 0.8.7
  • Bazel version: 3.7.2
  • Solution (e.g. FaceMesh, Pose, Holistic): Holistic
  • Programming Language and version ( e.g. C++, Python, Java): Kotlin 1.5.0

Describe the expected behavior: When activating Holistic tracking, it should be as fast as using Mediapipe version 0.8.3.2 Standalone code you may have used to try to get what you need :

If there is a problem, provide a reproducible test case that is the bare minimum necessary to generate the problem. If possible, please share a link to Colab/repo link /any notebook:

Other info / Complete Logs : Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached:

When using Mediapipe 0.8.3.2, activating Holistic tracking is really fast whereas it gets 13 seconds slower when switching to Mediapipe 0.8.6 and 0.8.7.

Please refer to android logs using Mediapipe 0.8.3.2 below, as you could see, it loads all the resources and activated holistic tracking in 1 second:

2021-08-26 14:35:36.181 31459-31526/com.xxx.instant I/native: I20210826 14:35:36.181147 31526 graph.cc:475] Start running the graph, waiting for inputs.
2021-08-26 14:35:36.181 31459-31526/com.xxx.instant I/native: I20210826 14:35:36.181371 31526 gl_context_egl.cc:163] Successfully initialized EGL. Major : 1 Minor: 5
2021-08-26 14:35:36.184 31459-28408/com.xxx.instant I/native: I20210826 14:35:36.183971 28408 gl_context.cc:331] GL version: 3.2 (OpenGL ES 3.2 [email protected] (GIT@4783c89, I46ff5fc46f, 1606807783) (Date:11/30/20))
2021-08-26 14:35:36.228 31459-28319/com.xxx.instant I/native: I20210826 14:35:36.228296 28319 resource_util_android.cc:87] Successfully loaded: pose_detection.tflite
2021-08-26 14:35:36.370 31459-28404/com.xxx.instant I/native: I20210826 14:35:36.370663 28404 resource_util_android.cc:87] Successfully loaded: pose_landmark_upper_body.tflite
2021-08-26 14:35:37.054 31459-28319/com.xxx.instant I/native: I20210826 14:35:37.054733 28319 resource_util_android.cc:87] Successfully loaded: face_detection_front.tflite
2021-08-26 14:35:37.057 31459-28319/com.xxx.instant E/tflite: Following operations are not supported by GPU delegate:
    DEQUANTIZE: 
    164 operations will run on the GPU, and the remaining 0 operations will run on the CPU.
2021-08-26 14:35:37.140 31459-28319/com.xxx.instant I/native: I20210826 14:35:37.140710 28319 resource_util_android.cc:87] Successfully loaded: hand_recrop.tflite
2021-08-26 14:35:37.234 31459-28319/com.xxx.instant I/native: I20210826 14:35:37.234760 28319 resource_util_android.cc:87] Successfully loaded: face_landmark.tflite
2021-08-26 14:35:37.558 31459-28319/com.xxx.instant I/native: I20210826 14:35:37.558815 28319 resource_util_android.cc:87] Successfully loaded: hand_landmark.tflite
2021-08-26 14:35:37.753 31459-28401/com.xxx.instant I/native: I20210826 14:35:37.753197 28401 resource_util_android.cc:87] Successfully loaded: handedness.txt
2021-08-26 14:35:37.753 31459-28319/com.xxx.instant I/native: I20210826 14:35:37.753664 28319 resource_util_android.cc:87] Successfully loaded: hand_recrop.tflite
2021-08-26 14:35:37.822 31459-28319/com.xxx.instant I/native: I20210826 14:35:37.822077 28319 resource_util_android.cc:87] Successfully loaded: hand_landmark.tflite
2021-08-26 14:35:37.991 31459-28404/com.xxx.instant I/native: I20210826 14:35:37.991438 28404 resource_util_android.cc:87] Successfully loaded: handedness.txt

Whereas it loads all the resources and activated holistic tracking in about 13 seconds if using Mediapipe 0.8.7 with pose_landmark_lite.tflite version:

2021-08-26 14:44:34.939 32116-32181/com.xxx.instant I/native: I20210826 14:44:34.939211 32181 graph.cc:474] Start running the graph, waiting for inputs.
2021-08-26 14:44:34.939 32116-32181/com.xxx.instant I/native: I20210826 14:44:34.939404 32181 gl_context_egl.cc:163] Successfully initialized EGL. Major : 1 Minor: 5
2021-08-26 14:44:34.941 32116-925/com.xxx.instant I/native: I20210826 14:44:34.941184   925 gl_context.cc:331] GL version: 3.2 (OpenGL ES 3.2 [email protected] (GIT@4783c89, I46ff5fc46f, 1606807783) (Date:11/30/20))
2021-08-26 14:44:34.973 32116-918/com.xxx.instant I/native: I20210826 14:44:34.973809   918 resource_util_android.cc:89] Successfully loaded: pose_landmark_lite.tflite
2021-08-26 14:44:34.991 32116-32340/com.xxx.instant I/native: I20210826 14:44:34.991230 32340 resource_util_android.cc:89] Successfully loaded: pose_detection.tflite
2021-08-26 14:44:35.008 32116-32340/com.xxx.instant I/tflite: Replacing 291 node(s) with delegate (unknown) node, yielding 1 partitions.
2021-08-26 14:44:35.279 32116-32340/com.xxx.instant I/tflite: Replacing 291 node(s) with delegate (unknown) node, yielding 1 partitions.
2021-08-26 14:44:35.935 32116-32116/com.xxx.instant I/delizer.instan: Explicit concurrent copying GC freed 9648(414KB) AllocSpace objects, 48(42MB) LOS objects, 75% free, 19MB/79MB, paused 130us total 63.095ms
2021-08-26 14:44:37.482 32116-32346/com.xxx.instant D/BufferPoolAccessor2.0: bufferpool2 0x6eeb395c68 : 4(1382400 size) total buffers - 4(1382400 size) used buffers - 0/4 (recycle/alloc) - 4/689 (fetch/transfer)
2021-08-26 14:44:37.647 32116-32130/com.xxx.instant I/delizer.instan: Background concurrent copying GC freed 15309(578KB) AllocSpace objects, 79(68MB) LOS objects, 75% free, 22MB/90MB, paused 196us total 155.637ms
2021-08-26 14:44:39.018 32116-32116/com.xxx.instant I/delizer.instan: Explicit concurrent copying GC freed 9638(445KB) AllocSpace objects, 45(39MB) LOS objects, 75% free, 19MB/79MB, paused 326us total 76.842ms
2021-08-26 14:44:41.195 32116-32130/com.xxx.instant I/delizer.instan: Background concurrent copying GC freed 15287(593KB) AllocSpace objects, 75(66MB) LOS objects, 75% free, 22MB/90MB, paused 438us total 137.780ms
2021-08-26 14:44:41.535 32116-32340/com.xxx.instant I/native: I20210826 14:44:41.535902 32340 tflite_gpu_runner.cc:142] OpenCL backend is used.
2021-08-26 14:44:41.857 32116-32340/com.xxx.instant I/tflite: Replacing 305 node(s) with delegate (TfLiteGpuDelegate) node, yielding 1 partitions.
2021-08-26 14:44:42.034 32116-32116/com.xxx.instant I/delizer.instan: Waiting for a blocking GC Explicit
2021-08-26 14:44:42.094 32116-32116/com.xxx.instant I/delizer.instan: WaitForGcToComplete blocked Explicit on HeapTrim for 60.004ms
2021-08-26 14:44:42.153 32116-32116/com.xxx.instant I/delizer.instan: Explicit concurrent copying GC freed 1522(183KB) AllocSpace objects, 10(8204KB) LOS objects, 75% free, 19MB/79MB, paused 111us total 58.735ms
2021-08-26 14:44:42.806 32116-32346/com.xxx.instant D/BufferPoolAccessor2.0: bufferpool2 0x6eeb395c68 : 4(1382400 size) total buffers - 4(1382400 size) used buffers - 0/4 (recycle/alloc) - 4/819 (fetch/transfer)
2021-08-26 14:44:43.796 32116-32340/com.xxx.instant I/native: I20210826 14:44:43.796684 32340 resource_util_android.cc:89] Successfully loaded: face_detection_short_range.tflite
2021-08-26 14:44:43.810 32116-32340/com.xxx.instant I/tflite: Replacing 164 node(s) with delegate (TfLiteGpuDelegate) node, yielding 1 partitions.
2021-08-26 14:44:44.069 32116-32130/com.xxx.instant I/delizer.instan: Background concurrent copying GC freed 15681(612KB) AllocSpace objects, 87(76MB) LOS objects, 74% free, 33MB/129MB, paused 8.333ms total 361.931ms
2021-08-26 14:44:44.948 32116-32340/com.xxx.instant I/native: I20210826 14:44:44.948566 32340 resource_util_android.cc:89] Successfully loaded: hand_recrop.tflite
2021-08-26 14:44:44.949 32116-32340/com.xxx.instant I/tflite: Replacing 63 node(s) with delegate (TfLiteGpuDelegate) node, yielding 1 partitions.
2021-08-26 14:44:45.250 32116-32116/com.xxx.instant I/delizer.instan: Explicit concurrent copying GC freed 20647(783KB) AllocSpace objects, 117(103MB) LOS objects, 75% free, 19MB/79MB, paused 58us total 70.985ms
2021-08-26 14:44:45.875 32116-32340/com.xxx.instant I/native: I20210826 14:44:45.875205 32340 resource_util_android.cc:89] Successfully loaded: face_landmark.tflite
2021-08-26 14:44:45.879 32116-32340/com.xxx.instant I/tflite: Replacing 97 node(s) with delegate (TfLiteGpuDelegate) node, yielding 1 partitions.
2021-08-26 14:44:46.909 32116-32340/com.xxx.instant I/native: I20210826 14:44:46.909003 32340 resource_util_android.cc:89] Successfully loaded: hand_landmark.tflite
2021-08-26 14:44:46.915 32116-32340/com.xxx.instant I/tflite: Replacing 64 node(s) with delegate (TfLiteGpuDelegate) node, yielding 1 partitions.
2021-08-26 14:44:47.144 32116-32130/com.xxx.instant I/delizer.instan: Background concurrent copying GC freed 18116(727KB) AllocSpace objects, 82(69MB) LOS objects, 72% free, 36MB/132MB, paused 845us total 341.878ms
2021-08-26 14:44:47.164 32116-32340/com.xxx.instant I/native: I20210826 14:44:47.164664 32340 resource_util_android.cc:89] Successfully loaded: hand_recrop.tflite
2021-08-26 14:44:47.165 32116-924/com.xxx.instant I/native: I20210826 14:44:47.164994   924 resource_util_android.cc:89] Successfully loaded: handedness.txt
2021-08-26 14:44:47.165 32116-32340/com.xxx.instant I/tflite: Replacing 63 node(s) with delegate (TfLiteGpuDelegate) node, yielding 1 partitions.
2021-08-26 14:44:47.335 32116-32340/com.xxx.instant I/native: I20210826 14:44:47.335409 32340 resource_util_android.cc:89] Successfully loaded: hand_landmark.tflite
2021-08-26 14:44:47.345 32116-32340/com.xxx.instant I/tflite: Replacing 64 node(s) with delegate (TfLiteGpuDelegate) node, yielding 1 partitions.
2021-08-26 14:44:47.532 32116-921/com.xxx.instant I/native: I20210826 14:44:47.531862   921 resource_util_android.cc:89] Successfully loaded: handedness.tx

As i observed in the above log, the significant amount of time were spent on GC, and there is new log entry Replacing 97 node(s) with delegate (TfLiteGpuDelegate) node, yielding 1 partitions. has never been presented in Mediapipe 0.8.3.2.

Is there anything i could provide more in order to explain the slowness of holistic tracking activation?

jianinz avatar Aug 26 '21 06:08 jianinz

Hi @jianinz, Could you please provide your code changes to understand the issue much better. Thanks!

sgowroji avatar Aug 26 '21 07:08 sgowroji

Hi @sgowroji, Of course, I will share code changes regarding the holistic graph here, since it's relevant. Noticed that we have our private customised google/mediapipe-based repo, with updating Mediapipe from 0.8.3.2 to 0.8.7, we pulled all changes from Google/Mediapipe tag 0.8.7 and merged with our own private repo.

The customised holistic tracking graph is based on https://github.com/google/mediapipe/blob/master/mediapipe/graphs/holistic_tracking/holistic_tracking_gpu.pbtxt, we added several more nodes (customized hand gesture etc.)for our own purpose. They stay the same through versions like 0.8.3.2 to 0.8.7

The actual code changes from updating Mediapipe 0.8.3.2 to 0.8.7 regarding holistic tracking graph part is only like following: Screenshot 2021-08-26 at 15 34 08

Everything else is based on google/mediapipe and it stayed the same with version 0.8.3.2 to 0.8.7

jianinz avatar Aug 26 '21 08:08 jianinz

Hi @ivan-grishchenko @sgowroji , do you have time to look at this issue in any possible way? thanks!

jianinz avatar Sep 06 '21 03:09 jianinz

Hi @ivan-grishchenko @sgowroji, according to the logs above, it seems that when holistic tracking is activated in 0.8.6 and higher, ReplaceNodeSubsetsWithDelegateKernels in the subgraph.cc was invoked and node replacement sets was not empty. Whereas in 0.8.3.2 is empty, this issue lays in the blind of my knowledge area now to figure out why it happend so, could you give me some guidance or advice? thanks

jianinz avatar Oct 14 '21 09:10 jianinz

Hi @jiuqiant, appreciated if any tips you could shed some light on this issue

jianinz avatar Oct 18 '21 03:10 jianinz

@chuoling would be the best person to comment on this issue.

ivan-grishchenko avatar Nov 22 '21 00:11 ivan-grishchenko

I have tried the latest 0.8.10 build, the problem still persists. @chuoling is there anything you could comment on this issue or give me some guidance or hint? Thanks!

jianinz avatar Aug 04 '22 09:08 jianinz

Just an update here, i managed to solve it by using different delegate option.

Before, the inference calculator uses the gpu delegate:

node {
  calculator: "InferenceCalculator"
  input_stream: "TENSORS:input_tensors"
  output_stream: "TENSORS:detection_tensors"
  options: {
    [mediapipe.InferenceCalculatorOptions.ext] {
      model_path: "mediapipe/modules/pose_detection/pose_detection.tflite"
      delegate: { gpu { use_advanced_gpu_api: true } }
    }
  }
}

Now if using xnnpack, the performance will be dramatically increased. This can also be verified by using performance benchmark tool provided by Google https://www.tensorflow.org/lite/performance/measurement

node {
  calculator: "InferenceCalculator"
  input_stream: "TENSORS:input_tensors"
  output_stream: "TENSORS:detection_tensors"
  options: {
    [mediapipe.InferenceCalculatorOptions.ext] {
      model_path: "mediapipe/modules/pose_detection/pose_detection.tflite"
      delegate: { xnnpack {} }
    }
  }
}

jianinz avatar Aug 17 '22 10:08 jianinz

#3545 Maybe this error is same like mine. xnnpack is cpu , cpu is ok gpu is wrong. something error in gpu.

linyu0219 avatar Aug 29 '22 05:08 linyu0219

After going through your issue #3545 , think we got different issues. xnnpack is cpu indeed, although gpu delegate supposed to be faster especially on my pixel 5 which supports ops from gpu delegate. Not really sure gpu is indeed slow in my case

jianinz avatar Aug 29 '22 09:08 jianinz

Are you satisfied with the resolution of your issue? Yes No

google-ml-butler[bot] avatar Aug 29 '22 09:08 google-ml-butler[bot]