insightface insightface.app.FaceAnalysis work slowly

It takes 5 sec to get this on a normal windows pc, but takes over 35 secs on mac m1. After the first Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}} appears, it would stock for 30 secs. Is there any way to solve this problem?

Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
model ignore: /Users/xxx/.insightface/models/buffalo_l/1k3d68.onnx landmark_3d_68
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
model ignore: /Users/xxx/.insightface/models/buffalo_l/2d106det.onnx landmark_2d_106
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: /Users/xxx/.insightface/models/buffalo_l/det_10g.onnx detection [1, 3, '?', '?'] 127.5 128.0
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
model ignore: /Users/xxx/.insightface/models/buffalo_l/genderage.onnx genderage
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
model ignore: /Users/xxx/.insightface/models/buffalo_l/w600k_r50.onnx recognition
set det-size: (640, 640)

Dec 05 '22 06:12 Chris-fullerton

You should install onnxruntime-gpu, instead of onnxruntime

Dec 05 '22 13:12 nttstar

I am not pretty sure but seems that it's impossible to install onnxruntime-gpu now.

I temporary solve this problem by loading the model of detection ONLY, instead of loading ALL models and drop most ot them.

Dec 13 '22 01:12 Chris-fullerton

@nttstar Have I miss understanded your answer? Already tried but onnxruntime-gpu cannot be installed on Mac M1 Ventura 13.0.

I am not pretty sure but seems that it's impossible to install onnxruntime-gpu now.

pip install onnxruntime-gpu:

ERROR: Could not find a version that satisfies the requirement onnxruntime-gpu (from versions: none)
ERROR: No matching distribution found for onnxruntime-gpu

Here is my env-info:

conda list | grep onnx(can run but slowly when loading modules):

onnx                              1.12.0                   pypi_0    pypi
onnxruntime-silicon               1.11.1                   pypi_0    pypi

conda list | grep insightface:

insightface                       0.6.2                    pypi_0    pypi

And here is my code:

I temporary solve this problem by loading the model of detection ONLY, instead of loading ALL models and drop most ot them.

class FaceAnalysisSpeedUp(FaceAnalysis):
    """
    Overwrite some part of `insightface.app.FaceAnalysis` to enhance loading speed.
    """
    def __init__(self,
            name=DEFAULT_MP_NAME,
            root='~/.insightface',
            min_width=MIN_DETECTION_SIZE,
            min_height=MIN_DETECTION_SIZE,
            if_speed_up=True,
            **kwargs
        ):
        if if_speed_up:
            onnxruntime.set_default_logger_severity(3)
            self.models = {}
            self.model_dir = ensure_available('models', name, root=root)
            onnx_files = glob.glob(osp.join(self.model_dir, 'det_10g.onnx'))

            assert len(onnx_files) == 1
            model = model_zoo.get_model(onnx_files[0], **kwargs)
            print('find model:', onnx_files[0],
                model.taskname, model.input_shape, model.input_mean, model.input_std)
            self.models[model.taskname] = model
            assert 'detection' in self.models
            self.det_model = self.models['detection']
        else:
            super().__init__(allowed_modules=['detection'])

        # for self.get()
        self.min_width = min_width
        self.min_height = min_height

Jan 04 '23 15:01 Chris-fullerton

Any news about this issue? @nttstar

Feb 17 '23 16:02 Chris-fullerton

with mac m1/m2 u should use onnxruntime-silicon instead

Jun 25 '23 14:06 phineas-pta

with mac m1/m2 u should use onnxruntime-silicon instead

@phineas-pta Half right half wrong for me.

onnxruntime-silicon is made for CoreML

The official ONNX Runtime now contains arm64 binaries for MacOS as well, but they do only support the CPU backend. This version adds the CoreML backend with version v1.13.0.

Although I haven't got a clear answer from the developer of insightface, it seems that insightface only supports cuda and cpu at present, and does not support CoreML.（I tried to read the official documentation of onnxruntime later, but I couldn't understand it at all because of my lack of ability. Therefore, the discussion here is just my personal speculation.）

And the main reason is, the problem here is about the modules loading speed, not the image processing speed. For more specific, the loading of landmark modules takes too many time

Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
model ignore: /Users/xxx/.insightface/models/buffalo_l/1k3d68.onnx landmark_3d_68
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
model ignore: /Users/xxx/.insightface/models/buffalo_l/2d106det.onnx landmark_2d_106

I guess the problem is about onnxruntime on M1, the only thing can insightface do is enhance the loading logic in FaceAnalysis .

Jun 26 '23 01:06 Chris-fullerton

since i dont have mac, not sure if CoreML can speed up loading time, but comparing with a windows pc isnt simple, it depends on SSD + RAM speed, u have to find a windows pc with roughly same config as your mac to make proper comparison

now regarding "it seems that insightface only supports cuda and cpu":

u misread the code a bit, let's take a look again at https://github.com/deepinsight/insightface/blob/master/python-package/insightface/model_zoo/model_zoo.py

get_default_providers() returns cuda or cpu, but when it's used ? look down a few lines u see providers = kwargs.get('providers', get_default_providers())

MEANING if u dont explicitly specify providers, it'll try use cuda or cpu by default, NOT MEANING it only use cuda and cpu

example how to specify providers:

model_zoo.get_model(model_path, providers=onnxruntime.get_available_providers())

(just like other person commented on your other issue)

Jun 26 '23 08:06 phineas-pta

u have to find a windows pc with roughly same config as your mac to make proper comparison

Sorry about forgeting mention this before, I have already test with two Mac M1 PC and one Windows PC. The loading speed of insightface on Mac M1 is extremely slow and everything works normal on Windows PC.

now regarding "it seems that insightface only supports cuda and cpu": ... (just like other person commented on your other issue)

Sorry again because I forgot to add my test results what I made months ago: After I installed onnxruntime-silicon, and checked via onnxruntime.get_available_providers(), insightface didn't load the modules faster than before(also the image process speed).

And I guess that you have missed this above?

And the main reason is, the problem here is about the modules loading speed, not the image processing speed. For more specific, the loading of landmark modules takes too many time ...

I bring this up not to emphasize that I'm right, but because you don't seem to respond to this paragraph, so I bring it up here.

The tests I have done above have been done four months ago, and my impression of those tests is actually much weaker. I can only vaguely remember that I have done CoreML tests, but there is no improvement in any aspect.

But it is indeed possible that I did a little test here and there, maybe I didn’t use CoreML to test the reading speed at all, or maybe it actually improved the execution speed but I couldn’t observe it. 😭

@phineas-pta Thank you very much for your response, I will find time to test it again and update the results here.

Jun 26 '23 09:06 Chris-fullerton

you don't seem to respond to this paragraph

that's literally my 1st paragraph

test with two Mac M1 PC and one Windows PC

again, for comparison, windows pc must have same specs as mac in terms of SSD + RAM speed, also whether windows pc using cuda or not

After I installed onnxruntime-silicon, and checked via onnxruntime.get_available_providers()

as i said, u have to explicitly specify providers when using model_zoo.get_model(), installation alone isnt enough

Jun 26 '23 09:06 phineas-pta

that's literally my 1st paragraph again, for comparison, windows pc must have same specs as mac in terms of SSD + RAM speed, also whether windows pc using cuda or not

Got it, but in my opinion, difference between SSD + RAM shouldn't cause the delay time over 30 secs.（on Windows it spend less than 1 sec, and on Mac M1 spend at least 30 sec, sometimes even more）

as i said, u have to explicitly specify providers when using model_zoo.get_model(), installation alone isnt enough

Yes I did for sure, sorry again 😭 for providing incomplete info.

Jun 26 '23 09:06 Chris-fullerton

(just like other person commented on your other issue)

@phineas-pta btw, isn't https://github.com/deepinsight/insightface/issues/2238#issuecomment-1481081198 prove that insightface doesn't support CoreML? (that's me actually, it's mine personal account, and this one is mine work account 🤣 )

Jun 26 '23 09:06 Chris-fullerton

lmao now that makes sense :rofl:

so hardware shouldnt be the problem but maybe software + driver then

prove that insightface doesn't support CoreML

it comes from onnxruntime: see microsoft/onnxruntime#14212 for possible update

i come from roop community, i saw this from a mac user: it works for det_10g.onnx

Applied providers: ['CoreMLExecutionProvider', 'CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}, 'CoreMLExecutionProvider': {}}
find model: /Users/███/.insightface/models/buffalo_l/1k3d68.onnx landmark_3d_68 ['None', 3, 192, 192] 0.0 1.0
Applied providers: ['CoreMLExecutionProvider', 'CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}, 'CoreMLExecutionProvider': {}}
find model: /Users/███/.insightface/models/buffalo_l/2d106det.onnx landmark_2d_106 ['None', 3, 192, 192] 0.0 1.0
Applied providers: ['CoreMLExecutionProvider', 'CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}, 'CoreMLExecutionProvider': {}}
find model: /Users/███/.insightface/models/buffalo_l/det_10g.onnx detection [1, 3, '?', '?'] 127.5 128.0
Applied providers: ['CoreMLExecutionProvider', 'CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}, 'CoreMLExecutionProvider': {}}
find model: /Users/███/.insightface/models/buffalo_l/genderage.onnx genderage ['None', 3, 96, 96] 0.0 1.0
Applied providers: ['CoreMLExecutionProvider', 'CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}, 'CoreMLExecutionProvider': {}}
find model: /Users/███/.insightface/models/buffalo_l/w600k_r50.onnx recognition ['None', 3, 112, 112] 127.5 127.5
set det-size: (640, 640)

maybe because newer onnxruntime

Jun 26 '23 09:06 phineas-pta

maybe because newer onnxruntime

Sounds like a good research direction, thanks for your helping 👍

Jun 27 '23 14:06 Chris-fullerton

This issue has been resolved over time. Now, it takes a maximum of 10 seconds to load modules, compared to the previous 35 seconds.

I haven't conducted in-depth verification yet, but I speculate that the improvement in the loading speed of the insightface module is likely due to optimizations made by onnxruntime for compatibility with M1 during this period.

conda list "insightface|onnx":

# Name                    Version                   Build  Channel
insightface               0.7.3                    pypi_0    pypi
onnx                      1.15.0                   pypi_0    pypi
onnxruntime               1.16.2                   pypi_0    pypi

Note that I install onnxruntime instead of onnxruntime-silicon bececause I thought Insightface doesn't support CoreML before.

Jan 05 '24 06:01 changchiyou

insightface insightface copied to clipboard

insightface.app.FaceAnalysis work slowly

insightface
insightface copied to clipboard