insightface
insightface copied to clipboard
insightface.app.FaceAnalysis work slowly
It takes 5 sec to get this on a normal windows pc, but takes over 35 secs on mac m1. After the first Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
appears, it would stock for 30 secs. Is there any way to solve this problem?
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
model ignore: /Users/xxx/.insightface/models/buffalo_l/1k3d68.onnx landmark_3d_68
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
model ignore: /Users/xxx/.insightface/models/buffalo_l/2d106det.onnx landmark_2d_106
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: /Users/xxx/.insightface/models/buffalo_l/det_10g.onnx detection [1, 3, '?', '?'] 127.5 128.0
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
model ignore: /Users/xxx/.insightface/models/buffalo_l/genderage.onnx genderage
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
model ignore: /Users/xxx/.insightface/models/buffalo_l/w600k_r50.onnx recognition
set det-size: (640, 640)
You should install onnxruntime-gpu
, instead of onnxruntime
I am not pretty sure but seems that it's impossible to install onnxruntime-gpu now.
I temporary solve this problem by loading the model of detection ONLY, instead of loading ALL models and drop most ot them.
@nttstar Have I miss understanded your answer? Already tried but onnxruntime-gpu
cannot be installed on Mac M1 Ventura 13.0.
I am not pretty sure but seems that it's impossible to install onnxruntime-gpu now.
pip install onnxruntime-gpu
:
ERROR: Could not find a version that satisfies the requirement onnxruntime-gpu (from versions: none)
ERROR: No matching distribution found for onnxruntime-gpu
Here is my env-info:
-
conda list | grep onnx
(can run but slowly when loading modules):onnx 1.12.0 pypi_0 pypi onnxruntime-silicon 1.11.1 pypi_0 pypi
-
conda list | grep insightface
:insightface 0.6.2 pypi_0 pypi
And here is my code:
I temporary solve this problem by loading the model of detection ONLY, instead of loading ALL models and drop most ot them.
class FaceAnalysisSpeedUp(FaceAnalysis):
"""
Overwrite some part of `insightface.app.FaceAnalysis` to enhance loading speed.
"""
def __init__(self,
name=DEFAULT_MP_NAME,
root='~/.insightface',
min_width=MIN_DETECTION_SIZE,
min_height=MIN_DETECTION_SIZE,
if_speed_up=True,
**kwargs
):
if if_speed_up:
onnxruntime.set_default_logger_severity(3)
self.models = {}
self.model_dir = ensure_available('models', name, root=root)
onnx_files = glob.glob(osp.join(self.model_dir, 'det_10g.onnx'))
assert len(onnx_files) == 1
model = model_zoo.get_model(onnx_files[0], **kwargs)
print('find model:', onnx_files[0],
model.taskname, model.input_shape, model.input_mean, model.input_std)
self.models[model.taskname] = model
assert 'detection' in self.models
self.det_model = self.models['detection']
else:
super().__init__(allowed_modules=['detection'])
# for self.get()
self.min_width = min_width
self.min_height = min_height
Any news about this issue? @nttstar
with mac m1/m2 u should use onnxruntime-silicon
instead
with mac m1/m2 u should use
onnxruntime-silicon
instead
@phineas-pta Half right half wrong for me.
onnxruntime-silicon is made for CoreML
The official ONNX Runtime now contains arm64 binaries for MacOS as well, but they do only support the CPU backend. This version adds the CoreML backend with version v1.13.0.
Although I haven't got a clear answer from the developer of insightface, it seems that insightface only supports cuda and cpu at present, and does not support CoreML
.(I tried to read the official documentation of onnxruntime later, but I couldn't understand it at all because of my lack of ability. Therefore, the discussion here is just my personal speculation.)
And the main reason is, the problem here is about the modules loading speed, not the image processing speed. For more specific, the loading of landmark modules takes too many time
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
model ignore: /Users/xxx/.insightface/models/buffalo_l/1k3d68.onnx landmark_3d_68
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
model ignore: /Users/xxx/.insightface/models/buffalo_l/2d106det.onnx landmark_2d_106
I guess the problem is about onnxruntime
on M1, the only thing can insightface
do is enhance the loading logic in FaceAnalysis
.
since i dont have mac, not sure if CoreML can speed up loading time, but comparing with a windows pc isnt simple, it depends on SSD + RAM speed, u have to find a windows pc with roughly same config as your mac to make proper comparison
now regarding "it seems that insightface only supports cuda and cpu":
u misread the code a bit, let's take a look again at https://github.com/deepinsight/insightface/blob/master/python-package/insightface/model_zoo/model_zoo.py
get_default_providers()
returns cuda
or cpu
, but when it's used ? look down a few lines u see providers = kwargs.get('providers', get_default_providers())
MEANING if u dont explicitly specify providers
, it'll try use cuda
or cpu
by default, NOT MEANING it only use cuda
and cpu
example how to specify providers
:
model_zoo.get_model(model_path, providers=onnxruntime.get_available_providers())
(just like other person commented on your other issue)
u have to find a windows pc with roughly same config as your mac to make proper comparison
Sorry about forgeting mention this before, I have already test with two Mac M1 PC and one Windows PC. The loading speed of insightface
on Mac M1 is extremely slow and everything works normal on Windows PC.
now regarding "it seems that insightface only supports cuda and cpu": ... (just like other person commented on your other issue)
Sorry again because I forgot to add my test results what I made months ago: After I installed onnxruntime-silicon
, and checked via onnxruntime.get_available_providers()
, insightface
didn't load the modules faster than before(also the image process speed).
And I guess that you have missed this above?
And the main reason is, the problem here is about the modules loading speed, not the image processing speed. For more specific, the loading of landmark modules takes too many time ...
I bring this up not to emphasize that I'm right, but because you don't seem to respond to this paragraph, so I bring it up here.
The tests I have done above have been done four months ago, and my impression of those tests is actually much weaker. I can only vaguely remember that I have done CoreML
tests, but there is no improvement in any aspect.
But it is indeed possible that I did a little test here and there, maybe I didn’t use CoreML
to test the reading speed at all, or maybe it actually improved the execution speed but I couldn’t observe it. 😭
@phineas-pta Thank you very much for your response, I will find time to test it again and update the results here.
you don't seem to respond to this paragraph
that's literally my 1st paragraph
test with two Mac M1 PC and one Windows PC
again, for comparison, windows pc must have same specs as mac in terms of SSD + RAM speed, also whether windows pc using cuda or not
After I installed onnxruntime-silicon, and checked via onnxruntime.get_available_providers()
as i said, u have to explicitly specify providers when using model_zoo.get_model()
, installation alone isnt enough
that's literally my 1st paragraph again, for comparison, windows pc must have same specs as mac in terms of SSD + RAM speed, also whether windows pc using cuda or not
Got it, but in my opinion, difference between SSD + RAM shouldn't cause the delay time over 30 secs.(on Windows it spend less than 1 sec, and on Mac M1 spend at least 30 sec, sometimes even more)
as i said, u have to explicitly specify providers when using model_zoo.get_model(), installation alone isnt enough
Yes I did for sure, sorry again 😭 for providing incomplete info.
(just like other person commented on your other issue)
@phineas-pta btw, isn't https://github.com/deepinsight/insightface/issues/2238#issuecomment-1481081198 prove that insightface
doesn't support CoreML
? (that's me actually, it's mine personal account, and this one is mine work account 🤣 )
lmao now that makes sense :rofl:
so hardware shouldnt be the problem but maybe software + driver then
prove that insightface doesn't support CoreML
it comes from onnxruntime
: see microsoft/onnxruntime#14212 for possible update
i come from roop
community, i saw this from a mac user: it works for det_10g.onnx
Applied providers: ['CoreMLExecutionProvider', 'CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}, 'CoreMLExecutionProvider': {}}
find model: /Users/███/.insightface/models/buffalo_l/1k3d68.onnx landmark_3d_68 ['None', 3, 192, 192] 0.0 1.0
Applied providers: ['CoreMLExecutionProvider', 'CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}, 'CoreMLExecutionProvider': {}}
find model: /Users/███/.insightface/models/buffalo_l/2d106det.onnx landmark_2d_106 ['None', 3, 192, 192] 0.0 1.0
Applied providers: ['CoreMLExecutionProvider', 'CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}, 'CoreMLExecutionProvider': {}}
find model: /Users/███/.insightface/models/buffalo_l/det_10g.onnx detection [1, 3, '?', '?'] 127.5 128.0
Applied providers: ['CoreMLExecutionProvider', 'CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}, 'CoreMLExecutionProvider': {}}
find model: /Users/███/.insightface/models/buffalo_l/genderage.onnx genderage ['None', 3, 96, 96] 0.0 1.0
Applied providers: ['CoreMLExecutionProvider', 'CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}, 'CoreMLExecutionProvider': {}}
find model: /Users/███/.insightface/models/buffalo_l/w600k_r50.onnx recognition ['None', 3, 112, 112] 127.5 127.5
set det-size: (640, 640)
maybe because newer onnxruntime
maybe because newer
onnxruntime
Sounds like a good research direction, thanks for your helping 👍
This issue has been resolved over time. Now, it takes a maximum of 10 seconds to load modules, compared to the previous 35 seconds.
I haven't conducted in-depth verification yet, but I speculate that the improvement in the loading speed of the insightface module is likely due to optimizations made by onnxruntime for compatibility with M1 during this period.
-
conda list "insightface|onnx"
:# Name Version Build Channel insightface 0.7.3 pypi_0 pypi onnx 1.15.0 pypi_0 pypi onnxruntime 1.16.2 pypi_0 pypi
Note that I install
onnxruntime
instead ofonnxruntime-silicon
bececause I thoughtInsightface
doesn't supportCoreML
before.