InsightFace-REST icon indicating copy to clipboard operation
InsightFace-REST copied to clipboard

Fails to start on CPU with error: parsing message with type 'ONNX_REL_1_8.ModelProto'

Open spacemolly opened this issue 2 years ago • 11 comments

I have an issue running a cpu deployment as of recent versions. I've had had this running before without issues but building a cpu docker image using the defaults (scrfd_2.5g_gnkps, glintr100) now fails to start. I get an error message saying: Error parsing message with type 'ONNX_REL_1_8.ModelProto'.

Full logs:

Preparing models... [14:38:10] INFO - Preparing 'scrfd_2.5g_gnkps' model... [14:38:10] INFO - Reshaping ONNX inputs to: (1, 3, 640, 640) [14:38:10] INFO - 'scrfd_2.5g_gnkps' model ready! [14:38:10] INFO - Preparing 'glintr100' model... No module named 'cupy' Traceback (most recent call last): File "prepare_models.py", line 52, in prepare_models() File "prepare_models.py", line 44, in prepare_models prepare_backend(model_name=model, backend_name=env_configs.models.backend_name, im_size=max_size, File "/app/modules/model_zoo/getter.py", line 125, in prepare_backend model = onnx.load(onnx_path) File "/usr/local/lib/python3.8/site-packages/onnx/init.py", line 119, in load_model model = load_model_from_string(s, format=format) File "/usr/local/lib/python3.8/site-packages/onnx/init.py", line 156, in load_model_from_string return _deserialize(s, ModelProto()) File "/usr/local/lib/python3.8/site-packages/onnx/init.py", line 97, in _deserialize decoded = cast(Optional[int], proto.ParseFromString(s)) google.protobuf.message.DecodeError: Error parsing message with type 'ONNX_REL_1_8.ModelProto' Starting InsightFace-REST using 2 workers. [2022-09-24 14:38:11 +0000] [11] [INFO] Starting gunicorn 20.1.0 [2022-09-24 14:38:11 +0000] [11] [INFO] Listening at: http://0.0.0.0:18080 (11) [2022-09-24 14:38:11 +0000] [11] [INFO] Using worker: uvicorn.workers.UvicornWorker [2022-09-24 14:38:11 +0000] [13] [INFO] Booting worker with pid: 13 [2022-09-24 14:38:11 +0000] [14] [INFO] Booting worker with pid: 14 [14:38:12] INFO - Reshaping ONNX inputs to: (1, 3, 640, 640) [14:38:12] INFO - Reshaping ONNX inputs to: (1, 3, 640, 640) [14:38:12] INFO - Detector started [14:38:12] INFO - Warming up face detection ONNX Runtime engine... [14:38:12] INFO - Detector started [14:38:12] INFO - Warming up face detection ONNX Runtime engine... No module named 'cupy' [2022-09-24 14:38:12 +0000] [13] [ERROR] Exception in worker process Traceback (most recent call last): File "/usr/local/lib/python3.8/site-packages/gunicorn/arbiter.py", line 589, in spawn_worker worker.init_process() File "/usr/local/lib/python3.8/site-packages/uvicorn/workers.py", line 66, in init_process super(UvicornWorker, self).init_process() File "/usr/local/lib/python3.8/site-packages/gunicorn/workers/base.py", line 134, in init_process self.load_wsgi() File "/usr/local/lib/python3.8/site-packages/gunicorn/workers/base.py", line 146, in load_wsgi self.wsgi = self.app.wsgi() File "/usr/local/lib/python3.8/site-packages/gunicorn/app/base.py", line 67, in wsgi self.callable = self.load() File "/usr/local/lib/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 58, in load return self.load_wsgiapp() File "/usr/local/lib/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp return util.import_app(self.app_uri) File "/usr/local/lib/python3.8/site-packages/gunicorn/util.py", line 359, in import_app mod = importlib.import_module(module) File "/usr/local/lib/python3.8/importlib/init.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1014, in _gcd_import File "", line 991, in _find_and_load File "", line 975, in _find_and_load_unlocked File "", line 671, in _load_unlocked File "", line 843, in exec_module File "", line 219, in _call_with_frames_removed File "/app/app.py", line 36, in processing = Processing(det_name=configs.models.det_name, rec_name=configs.models.rec_name, File "/app/modules/processing.py", line 82, in init self.model = FaceAnalysis(det_name=det_name, rec_name=rec_name, ga_name=ga_name, File "/app/modules/face_model.py", line 86, in init self.rec_model = get_model(rec_name, backend_name=backend_name, force_fp16=force_fp16, File "/app/modules/model_zoo/getter.py", line 206, in get_model model_path = prepare_backend(model_name, backend_name, im_size=im_size, max_batch_size=max_batch_size, File "/app/modules/model_zoo/getter.py", line 125, in prepare_backend model = onnx.load(onnx_path) File "/usr/local/lib/python3.8/site-packages/onnx/init.py", line 119, in load_model model = load_model_from_string(s, format=format) File "/usr/local/lib/python3.8/site-packages/onnx/init.py", line 156, in load_model_from_string return _deserialize(s, ModelProto()) File "/usr/local/lib/python3.8/site-packages/onnx/init.py", line 97, in _deserialize decoded = cast(Optional[int], proto.ParseFromString(s)) google.protobuf.message.DecodeError: Error parsing message with type 'ONNX_REL_1_8.ModelProto' [2022-09-24 14:38:12 +0000] [13] [INFO] Worker exiting (pid: 13) [2022-09-24 14:38:12 +0000] [14] [ERROR] Exception in worker process Traceback (most recent call last): File "/usr/local/lib/python3.8/site-packages/gunicorn/arbiter.py", line 589, in spawn_worker worker.init_process() File "/usr/local/lib/python3.8/site-packages/uvicorn/workers.py", line 66, in init_process super(UvicornWorker, self).init_process() File "/usr/local/lib/python3.8/site-packages/gunicorn/workers/base.py", line 134, in init_process self.load_wsgi() File "/usr/local/lib/python3.8/site-packages/gunicorn/workers/base.py", line 146, in load_wsgi self.wsgi = self.app.wsgi() File "/usr/local/lib/python3.8/site-packages/gunicorn/app/base.py", line 67, in wsgi self.callable = self.load() File "/usr/local/lib/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 58, in load return self.load_wsgiapp() File "/usr/local/lib/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp return util.import_app(self.app_uri) File "/usr/local/lib/python3.8/site-packages/gunicorn/util.py", line 359, in import_app mod = importlib.import_module(module) File "/usr/local/lib/python3.8/importlib/init.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1014, in _gcd_import File "", line 991, in _find_and_load File "", line 975, in _find_and_load_unlocked File "", line 671, in _load_unlocked File "", line 843, in exec_module File "", line 219, in _call_with_frames_removed File "/app/app.py", line 36, in processing = Processing(det_name=configs.models.det_name, rec_name=configs.models.rec_name, File "/app/modules/processing.py", line 82, in init self.model = FaceAnalysis(det_name=det_name, rec_name=rec_name, ga_name=ga_name, File "/app/modules/face_model.py", line 86, in init self.rec_model = get_model(rec_name, backend_name=backend_name, force_fp16=force_fp16, File "/app/modules/model_zoo/getter.py", line 206, in get_model model_path = prepare_backend(model_name, backend_name, im_size=im_size, max_batch_size=max_batch_size, File "/app/modules/model_zoo/getter.py", line 125, in prepare_backend model = onnx.load(onnx_path) File "/usr/local/lib/python3.8/site-packages/onnx/init.py", line 119, in load_model model = load_model_from_string(s, format=format) File "/usr/local/lib/python3.8/site-packages/onnx/init.py", line 156, in load_model_from_string return _deserialize(s, ModelProto()) File "/usr/local/lib/python3.8/site-packages/onnx/init.py", line 97, in _deserialize decoded = cast(Optional[int], proto.ParseFromString(s)) google.protobuf.message.DecodeError: Error parsing message with type 'ONNX_REL_1_8.ModelProto' No module named 'cupy' [2022-09-24 14:38:12 +0000] [14] [INFO] Worker exiting (pid: 14) [2022-09-24 14:38:12 +0000] [11] [WARNING] Worker with pid 14 was terminated due to signal 15 [2022-09-24 14:38:13 +0000] [11] [INFO] Shutting down: Master [2022-09-24 14:38:13 +0000] [11] [INFO] Reason: Worker failed to boot.

spacemolly avatar Sep 24 '22 14:09 spacemolly

Hi! That's strange I have just build image from scratch without cache and pulled latest python:3.8-slim base image - everything works as expected.

SthPhoenix avatar Sep 24 '22 15:09 SthPhoenix

Have you managed to figure out the issue? I still can't reproduce it.

SthPhoenix avatar Sep 29 '22 21:09 SthPhoenix

Hello, have you solved this problem? I have encountered the same problem. If so, I hope to see your reply

jinzaz avatar Oct 29 '22 11:10 jinzaz

I wasn't able to reproduce this bug, @jinzaz could you share more info about your environment? Are you running service in docker? Could you compute md5 sums of onnx files? It might be that model files were broken during download.

P.S. I have just tried building it again without cache and everything still works as expected. More details are required to figure out your issue.

P.P.S I have just committed update to model configs, containing md5 check sums, in case model was broken during download it should be downloaded again on next launch.

SthPhoenix avatar Oct 29 '22 12:10 SthPhoenix

Proper md5 sum for glintr100 model should be 3b366b98f786426f79629ddb2e56629c, in case you got different check sum you just need to re-download the model, with latest commit it'll be downloaded again automaticaly upon start.

SthPhoenix avatar Oct 29 '22 13:10 SthPhoenix

@SthPhoenix I don't know if there are no models at all. I use deploy_ cpu. Sh Run and build the docker, but there is no onnx file in the built models folder. Do you need to download the model yourself

jinzaz avatar Oct 31 '22 01:10 jinzaz

@SthPhoenix this is my error code logs info:

entrypoint.sh: line 2: $'\r': command not found Preparing models... ': [Errno 2] No such file or directorys.py entrypoint.sh: line 5: $'\r': command not found Starting InsightFace-REST using 1 workers. [2022-10-30 10:26:17 +0000] [1] [INFO] Starting gunicorn 20.1.0 [2022-10-30 10:26:17 +0000] [1] [INFO] Listening at: http://0.0.0.0:18080 (1) [2022-10-30 10:26:17 +0000] [1] [INFO] Using worker: uvicorn.workers.UvicornWorker [2022-10-30 10:26:17 +0000] [13] [INFO] Booting worker with pid: 13 [2022-10-30 10:26:19 +0000] [13] [ERROR] Exception in worker process Traceback (most recent call last): File "/usr/local/lib/python3.8/site-packages/gunicorn/arbiter.py", line 589, in spawn_worker worker.init_process() File "/usr/local/lib/python3.8/site-packages/uvicorn/workers.py", line 66, in init_process super(UvicornWorker, self).init_process() File "/usr/local/lib/python3.8/site-packages/gunicorn/workers/base.py", line 134, in init_process self.load_wsgi() File "/usr/local/lib/python3.8/site-packages/gunicorn/workers/base.py", line 146, in load_wsgi self.wsgi = self.app.wsgi() File "/usr/local/lib/python3.8/site-packages/gunicorn/app/base.py", line 67, in wsgi self.callable = self.load() File "/usr/local/lib/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 58, in load return self.load_wsgiapp() File "/usr/local/lib/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp return util.import_app(self.app_uri) File "/usr/local/lib/python3.8/site-packages/gunicorn/util.py", line 359, in import_app mod = importlib.import_module(module) File "/usr/local/lib/python3.8/importlib/init.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1014, in _gcd_import File "", line 991, in _find_and_load File "", line 975, in _find_and_load_unlocked File "", line 671, in _load_unlocked File "", line 843, in exec_module File "", line 219, in _call_with_frames_removed File "/app/app.py", line 34, in processing = Processing(det_name=settings.models.det_name, rec_name=settings.models.rec_name, File "/app/modules/processing.py", line 78, in init self.model = FaceAnalysis(det_name=det_name, File "/app/modules/face_model.py", line 72, in init self.det_model = Detector(det_name=det_name, max_size=self.max_size, No module named 'cupy' File "/app/modules/face_model.py", line 30, in init self.retina = get_model(det_name, backend_name=backend_name, force_fp16=force_fp16, im_size=max_size, File "/app/modules/model_zoo/getter.py", line 206, in get_model model_path = prepare_backend(model_name, backend_name, im_size=im_size, max_batch_size=max_batch_size, File "/app/modules/model_zoo/getter.py", line 125, in prepare_backend model = onnx.load(onnx_path) File "/usr/local/lib/python3.8/site-packages/onnx/init.py", line 118, in load_model s = _load_bytes(f) File "/usr/local/lib/python3.8/site-packages/onnx/init.py", line 32, in _load_bytes with open(cast(Text, f), 'rb') as readable: FileNotFoundError: [Errno 2] No such file or directory: '/models/onnx/scrfd_2.5g_gnkps/scrfd_2.5g_gnkps.onnx'

jinzaz avatar Oct 31 '22 01:10 jinzaz

It might be google drive is inaccessible in your region, you can try manually downloading models from google drive using proxy: srcfd_2.5g_gnkps glintr100

Models should be placed under following path: repo_root/models/onnx/{model_name}/{model_name}.onnx

SthPhoenix avatar Oct 31 '22 17:10 SthPhoenix

@SthPhoenix If I download the model file first, which folder should I put in the directory before copying it to the Docker container

jinzaz avatar Nov 03 '22 06:11 jinzaz

@SthPhoenix If I download the model file first, which folder should I put in the directory before copying it to the Docker container

From https://github.com/SthPhoenix/InsightFace-REST/issues/99#issuecomment-1297397758:

Models should be placed under following path:
repo_root/models/onnx/{model_name}/{model_name}.onnx

For example the glintr100 model should be in

repo_root/models/onnx/glintr100/glintr100.onnx

felixdollack avatar Nov 03 '22 16:11 felixdollack

It might be google drive is inaccessible in your region, you can try manually downloading models from google drive using proxy: srcfd_2.5g_gnkps glintr100

Models should be placed under following path: repo_root/models/onnx/{model_name}/{model_name}.onnx

I download scrfd_2.5g_gnkps.onnx and check md5sum, I get - a711d520006b358240836689b26ab4b4 But you said that it should be 50febd32caa699ef7a47cf7422c56bbd for scrfd_2.5g_gnkps.onnx

kuanyshbakytuly avatar May 13 '24 03:05 kuanyshbakytuly