mace
mace copied to clipboard
Inference is slow on MALI GPUs
Before you open an issue, please make sure you have tried the following steps:
- Make sure your environment is the same with (https://mace.readthedocs.io/en/latest/installation/env_requirement.html).
- Have you ever read the document for your usage?
- Check if your issue appears in HOW-TO-DEBUG or FAQ.
- The form below must be filled.
System information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04
- NDK version(e.g., 15c): 18b
- GCC version(if compiling for host, e.g., 5.4.0): 5.4.0
- MACE version (Use the command: git describe --long --tags): 0.11.0-rc0
- Python version(2.7): 2.7
- Bazel version (e.g., 0.13.0): 0.16.0
Model deploy file (*.yml)
# The name of library
library_name: FD
target_abis: [arm64-v8a]
target_socs: [rk3399]
model_graph_format: file
model_data_format: file
models:
RF: # model tag, which will be used in model loading and must be specific.
platform: caffe
# path to your tensorflow model's pb file. Support local path, http:// and https://
model_file_path: /models/model.prototxt
weight_file_path: /models/model.caffemodel
# sha256_checksum of your model's pb file.
# use this command to get the sha256_checksum --> sha256sum path/to/your/pb/file
model_sha256_checksum: 81c388e812da37e499da8272eff0d7d140e8ae50dcb8d7e124dbd4e98462ad24
weight_sha256_checksum: 2250beffe1bc13f96f60b95fa37f48848bb31f567ae9eb763c86496a4ae29c9b
subgraphs:
- input_tensors:
- data
input_shapes:
- 1,3,640,480
input_data_formats:
- NCHW
output_tensors:
- face_rpn_cls_prob_stride128
- face_rpn_bbox_pred_stride128
- face_rpn_landmark_pred_stride128
- face_rpn_cls_prob_stride64
- face_rpn_bbox_pred_stride64
- face_rpn_landmark_pred_stride64
- face_rpn_cls_prob_stride32
- face_rpn_bbox_pred_stride32
- face_rpn_landmark_pred_stride32
- face_rpn_cls_prob_stride16
- face_rpn_bbox_pred_stride16
- face_rpn_landmark_pred_stride16
- face_rpn_cls_prob_stride8
- face_rpn_bbox_pred_stride8
- face_rpn_landmark_pred_stride8
- face_rpn_cls_prob_stride4
- face_rpn_bbox_pred_stride4
- face_rpn_landmark_pred_stride4
output_shapes:
- 1,2,5,5
- 1,4,5,5
- 1,10,5,5
- 1,2,10,10
- 1,4,10,10
- 1,10,10,10
- 1,2,20,20
- 1,4,20,20
- 1,10,20,20
- 1,2,40,40
- 1,4,40,40
- 1,10,40,40
- 1,2,80,80
- 1,4,80,80
- 1,10,80,80
- 1,2,160,160
- 1,4,160,160
- 1,10,160,160
output_data_formats:
- NCHW
- NCHW
- NCHW
- NCHW
- NCHW
- NCHW
- NCHW
- NCHW
- NCHW
- NCHW
- NCHW
- NCHW
- NCHW
- NCHW
- NCHW
- NCHW
- NCHW
- NCHW
obfuscate: 0
runtime: cpu+gpu # cpu, gpu or cpu+gpu or dsp
winograd: 4
Describe the problem
Inference time on MALI GPUs is very slow compared to other frameworks and a lot slower than the same model running on Adreno GPUs.
To Reproduce
Steps to reproduce the problem:
1. cd /path/to/mace
2. python tools/converter.py convert --config_file=/path/to/your/model_deployment_file
2. python tools/converter.py benchmark --config_file=/path/to/your/model_deployment_file
Error information / logs
Please include the full log and/or traceback here.
LOGs
Additional context
For example, the model running with the above yml file takes:
- 328ms on a MALI T864
- 18ms on Adreno 640
- 233ms on MALI T864 (using Alibaba/MNN)
I will check this later. Have you ever benchmarked another model like MobileNet? BTW, which backend of MNN did you use? OpenCL or Vulkan?
I will check this later. Have you ever benchmarked another model like MobileNet? BTW, which backend of MNN did you use? OpenCL or Vulkan?
Well, my model has a mobilenet backbone with just some feature extraction layers on the top.
I've tried both Vulkan and OpenCL backend on MNN, but OpenCL is faster in my case, so the time in the initial post is OpenCL one.
@lydoc have you had the chance to investigate yet?
Sorry for the late reply, is it convenient for you to share your model?
@lydoc you can grab model files and updated yaml
configuration here
We are having the same issue:
SM-G960U | 10,31FPS (Samsung Galaxy S9 GLOBAL with Adreno 630) SM-N960F | 5,68FPS (Samsung Galaxy S9 EU with Mali-G72)
We have about half the FPS on Mali GPUs of the same phone model