CompreFace
CompreFace copied to clipboard
Google Coral support
ML part to Google Coral accelerator If the Google Coral USB accelerator (Google Edge TPU) could be used for CompreFace it would offload the ML part. It could possibly run also on ARM as there are Docker Images (for ARM too) to support the Google Edge TPU.
I got it running for another project (https://github.com/matiasdelellis/facerecognition) as a simple solution to provide a version with modularization (https://github.com/matiasdelellis/facerecognition/issues/210) in mind to make it run on eg. RaspberryPi 4.
I think this project could be used in the way I did with my simple docker container. The Nextcloud facerecognition app could use CompreFace as a engine for the ML part and with ML accelerators (like Google Coral or others) it would offer enough ML power to be usable on "home Nextcloud Clouds" in your closet 😉.
I totally agree with you that support of Google Coral accelerator would be great. But I described all the problems here: https://github.com/exadel-inc/CompreFace/issues/519#issuecomment-848057021
With the PR https://github.com/exadel-inc/CompreFace/pull/580 for Coral support, is this looking any more possible?
I tried looking through the contents of the PR but couldn't actually tell if it's possible to start using without further code changes.
@pospielov or @iamrinni , can you comment?
Irina did a great job and managed to run compreface-core functionality on Google Coral. The problem she faced is putting this functionality into a docker container. There are instructions on how to use Coral on Linux, but she has MacOS. Unfortunately, she left the project, and this task is stuck. I don't have time for it for now. Probably I'll return to it next year or if we find another contributor with Google Coral
Thanks @pospielov ,
Can you offer any advice on how to get moving with it?
My python skills are pretty much nonexistent, but if it’s just wrapping everything up in Docker, I might be able to help.
The PR that is pending seems to have Docker images included… Did these not work on testing?
Haven’t properly gotten to grips with the architecture to know how everything hangs together yet, but is the ideal that the only thing that needs replacing is the container(s) running TensorFlow and then it should “just work”?
I understand that might be a time consuming question to answer, only if you have time and any hits in the right direction would be appreciated.
https://github.com/exadel-inc/CompreFace/blob/EFRS-1114/embedding-calculator/tpu.Dockerfile.full
here is the new docker file. As I understand in Irina's environment it didn't see the Google Coral.
Probably the problem is with MacOS
If you have Linux, you can try it out - probably it already works.
Here is a doc with example of using Google Coral with docker:
https://medium.com/star-gazers/running-tensorflow-lite-at-the-edge-with-raspberry-pi-google-coral-and-docker-83d01357469c
I think to build it you can use this command:
docker build -t embedding-calculator --build-arg SKIP_TESTS=true .
to run, this:
docker run -p 3000:3000 --privileged -v /dev/bus/usb:/dev/bus/usb embedding-calculator
Then if everything ok in logs, you can open swagger in the browser:
http://localhost:3300/apidocs
and then invoke /find_faces
request. If everything ok - then we can try to put it into docker-compose and check the whole system
Sorry I'm jumping onboard this issue, but I wanted to share my progress:
So I tried running this in my current Test env, and ran into some issues. (Unraid os, which is a stripped linux version) Fixed most of them, but I couldn't fix some of them(mainly python related, and because of the stripped OS) So I'm currently spinning up a Ubuntun20.04 VM in my lab to see if I can get this working. I have a spare Coral that I'm going to use for testing. If I make a succesfull build + running the Swagger stuff I'll let you know.
Small update: Container building went successful, but I can't seem to run it.
[uWSGI] getting INI configuration from uwsgi.ini
*** Starting uWSGI 2.0.19 (64bit) on [Mon Nov 22 16:42:25 2021] ***
compiled with version: 8.3.0 on 22 November 2021 16:30:18
os: Linux-5.4.0-90-generic #101-Ubuntu SMP Fri Oct 15 20:00:55 UTC 2021
nodename: 16e81a2d3def
machine: x86_64
clock source: unix
detected number of CPU cores: 3
current working directory: /app/ml
detected binary path: /usr/local/bin/uwsgi
!!! no internal routing support, rebuild with pcre support !!!
setgid() to 33
setuid() to 33
your memory page size is 4096 bytes
detected max file descriptor number: 1048576
lock engine: pthread robust mutexes
thunder lock: disabled (you can enable it with --thunder-lock)
uwsgi socket 0 bound to TCP address 0.0.0.0:3000 fd 3
Python version: 3.7.3 (default, Jan 22 2021, 20:04:44) [GCC 8.3.0]
Python main interpreter initialized at 0x55f50b2fd050
python threads support enabled
your server socket listen backlog is limited to 100 connections
your mercy for graceful operations on workers is 60 seconds
mapped 218712 bytes (213 KB) for 2 cores
*** Operational MODE: preforking ***
Traceback (most recent call last):
File "./src/app.py", line 25, in <module>
from src.init_runtime import init_runtime
File "./src/init_runtime.py", line 21, in <module>
from src._logging import init_logging
File "./src/_logging.py", line 22, in <module>
from yaml import YAMLLoadWarning
ImportError: cannot import name 'YAMLLoadWarning' from 'yaml' (/usr/local/lib/python3.7/dist-packages/yaml/__init__.py)
unable to load app 0 (mountpoint='') (callable not found or import error)
*** no app loaded. GAME OVER ***
I noticed the branch from Irina was behind the Master, so I did a merge from master and now it's building again. Unfortunately the build takes quite long, so I'm just waiting on that to see if that fixes the issue.
https://github.com/exadel-inc/CompreFace/commit/fe08d90abab5cabd6829c684c6bd39a8ea86d196#diff-462581714c2d689beb979af2fa29b0c9122382efafbe9133b4319d79c1c8d6e8
This build problem could be fixed by adding PyYAML==5.4.1
to requirements.txt
file
So yes - if you merge master into this branch it should fix this problem. But it could appear other problems, who knows. So I would recommend fixing it directly in requirements.txt
file, and merging master only after make sure that everything works
Last update: Got the container running, the merging from the master seems to have done the trick. I'm currently trying to scan a image using the Swagger ui, but I'm getting the following error:
{"severity": "CRITICAL", "message": "RuntimeError: Node number 3 (CONV_2D) failed to invoke.\n", "request": {"method": "POST", "path": "/find_faces?limit=0", "filename": "download.jpeg", "api_key": "", "remote_addr": "192.168.1.174"}, "logger": "src.services.flask_.error_handling", "module": "error_handling", "traceback": "Traceback (most recent call last):\n File \"/usr/local/lib/python3.7/dist-packages/flask/app.py\", line 1950, in full_dispatch_request\n rv = self.dispatch_request()\n File \"/usr/local/lib/python3.7/dist-packages/flask/app.py\", line 1936, in dispatch_request\n return self.view_functions[rule.endpoint](**req.view_args)\n File \"./src/services/flask_/needs_attached_file.py\", line 32, in wrapper\n return f(*args, **kwargs)\n File \"./src/_endpoints.py\", line 72, in find_faces_post\n face_plugins=face_plugins\n File \"./src/services/facescan/plugins/mixins.py\", line 44, in __call__\n faces = self._fetch_faces(img, det_prob_threshold)\n File \"./src/services/facescan/plugins/mixins.py\", line 51, in _fetch_faces\n boxes = self.find_faces(img, det_prob_threshold)\n File \"./src/services/facescan/plugins/facenet/coralmtcnn/coralmtcnn.py\", line 78, in find_faces\n detect_face_result = fdn.detect_faces(img)\n File \"/usr/local/lib/python3.7/dist-packages/mtcnn_tflite/MTCNN.py\", line 308, in detect_faces\n result = stage(img, result[0], result[1])\n File \"/usr/local/lib/python3.7/dist-packages/mtcnn_tflite/MTCNN.py\", line 357, in __stage1\n pnetlite.invoke()\n File \"/usr/local/lib/python3.7/dist-packages/tensorflow_core/lite/python/interpreter.py\", line 493, in invoke\n self._interpreter.Invoke()\n File \"/usr/local/lib/python3.7/dist-packages/tensorflow_core/lite/python/interpreter_wrapper/tensorflow_wrap_interpreter_wrapper.py\", line 113, in Invoke\n return _tensorflow_wrap_interpreter_wrapper.InterpreterWrapper_Invoke(self)\nRuntimeError: Node number 3 (CONV_2D) failed to invoke.\n\n", "build_version": "dev"}
2021-11-22 17:10:35.050104: E tensorflow/stream_executor/cuda/cuda_driver.cc:351] failed call to cuInit: UNKNOWN ERROR (303)
I currently don't have time to debug any further, will continue a bit later. Maybe someone can continue based on what I provided. I pushed my changed to my own fork here: https://github.com/patatman/CompreFace/tree/EFRS-1114
I continued today a bit, and I ruled out some issues:
- The USB is properly passed to the container
- Google Coral is working, tested this with the Google Example
It really look like this part is trowing the error:
2021-11-23 20:09:37.524992: E tensorflow/stream_executor/cuda/cuda_driver.cc:351] failed call to cuInit: UNKNOWN ERROR (303)
I'm not sure why though, and I'm not a programmer. If someone with better understanding of Python or this package is willing to help, that would be awesome! I'm open to do a pair-programming session to debug/solve this issue.
CUDA is the name of nVidia’s GPGPU cores, and from the file names, it looks like it is using TensorFlow, rather than TensorFlow Lite (required for Coral).
In the PR’s main comment block, she referenced a function where you have to pass in the parameter of “TPU”, but I didn’t understand where that referenced.
I had a closer look at the PR, but even after changing the code to:
def _calculate_embeddings(self, cropped_images, mode='CPU'):
"""Run forward pass to calculate embeddings"""
if mode == 'TPU':
calc_model = self._embedding_calculator_tpu
else:
calc_model = self._embedding_calculator_tpu
# cropped_images = [prewhiten(img).astype(np.float32) for img in cropped_images]
Basicly forcing it to always use the _embedding_calculator_tpu
part(If I'm correct). It still gives the same error.
The error below doesn't seem Google Coral related, it also generates this error if I use the CPU.
2021-11-23 20:09:37.524992: E tensorflow/stream_executor/cuda/cuda_driver.cc:351] failed call to cuInit: UNKNOWN ERROR (303)
I've debugged a ton, and currently I'm stuck on this error:
{
"message": "RuntimeError: Node number 3 (CONV_2D) failed to invoke.\n"
}
As soon as I try to invoke the find_faces
api it generates this error.
The Status api works as expected, and the program doesn't crash. It's just that specific request generates an error.
If I look into that error, some people are saying it's related to the model used. I'm not familiar enough to debug this any further.
Anyone have any ideas to continue? Currently stuck
Kind of a strange error.
I tried to build by myself. First - I took the clear branch and fixed only requirements.txt
as I mentioned.
Then build with:
docker build -t embedding-calculator --build-arg SKIP_TESTS=true -f tpu.Dockerfile .
Then run with and without Google Coral :
docker run -it --name=test -p 3000:3000 embedding-calculator
It worked normally. I mean according to code by default it will use CPU, not TPU.
Then I changed the file coralmtcnn.py the same way as you did.
Built and run with this command (and with Google Coral):
docker run -it --privileged -v /dev/bus/usb:/dev/bus/usb --name=test -p 3000:3000 embedding-calculator
When I tried to run the find_faces endpoint I got this error:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 1950, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 1936, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "./src/services/flask_/needs_attached_file.py", line 32, in wrapper
return f(*args, **kwargs)
File "./src/_endpoints.py", line 72, in find_faces_post
face_plugins=face_plugins
File "./src/services/facescan/plugins/mixins.py", line 46, in __call__
self._apply_face_plugins(face, face_plugins)
File "./src/services/facescan/plugins/mixins.py", line 67, in _apply_face_plugins
raise exceptions.PluginError(f'{plugin} error - {e}')
src.services.facescan.plugins.exceptions.PluginError: coralmtcnn.Calculator error - Failed to load delegate from libedgetpu.so.1
Looks like the problem with a library. I also got this warning:
{"severity": "DEBUG", "message": "Falling back to TensorFlow client; we recommended you install the Cloud TPU client directly with pip install cloud-tpu-client.",
I can't say where is the problem, I need to dig deeper to fix it.
So I tested again with the clean branch like you mentioned above.
- Adjust
requirements.txt
- Build and run successful
- adjust
embedding-calculator/src/services/facescan/plugins/facenet/coralmtcnn/coralmtcnn.py
to ONLY use Coral - Build and run Successful!
- remove Coral from USB (while container is running)
- Still works? So it wasn't using the Coral after all.
- check code inside container, to make sure it's running the version where both are using
self._embedding_calculator_tpu
- What is happening? haha I'm doing some more tests tomorrow.
Your error Failed to load delegate from libedgetpu.so.1
suggests it can't find the Coral. Make sure you have the coral plugged in before you run the container. I was trying to replicate this when I was running step 5 haha.
Do you have the same error as me if you plug out Coral?
No, I can't seem to get the code to run of the Coral. It doesn't matter if I have it plugged in, and with the code adjustment to force is. It will always use the CPU instead of coral.
I can gladly be part of testing on my RPI4+Google Coral running Frigate today
Any updates on support for Google Coral?
Unfortunately no, we don't have contributors with Google Coral now
I know a bit of python and have a raspberry pi 4 8gb 64bit with a coral ai what is the latest code with coral support. I would be happy to help.
First, this would be great! Second, it's not so simple
There are two problems to solve:
- Add Google Coral support
- Build it for arm devices
What I suggest is first try to build CompreFace with Google Coral support and then try to build it for arm devices.
We have a branch https://github.com/exadel-inc/CompreFace/tree/EFRS-1114, which is quite old, but it's a good idea to start from it. Could you try if it builds and works with Google Coral?
There are two docker files: https://github.com/exadel-inc/CompreFace/blob/EFRS-1114/embedding-calculator/tpu.Dockerfile https://github.com/exadel-inc/CompreFace/blob/EFRS-1114/embedding-calculator/tpu.Dockerfile.full Not sure what is the difference, but it's good to start researching from them
Are these Dockerfiles ARM64 compatible?
No, we don't have ARM-compatible dockerfiles. This is another challenge
I have all that and a Jetson Nano along with a test VMWare environment running on a Dell PowerEdge R620. Although I'm no programmer I do have 26 year of IT background. Mainly networking
https://github.com/exadel-inc/CompreFace/issues/610#issuecomment-1235710009 I described here what we need to do to support ARM devices. Will you be able to help with it?
#610 (comment) I described here what we need to do to support ARM devices. Will you be able to help with it?
I would love to.
https://github.com/exadel-inc/CompreFace/issues/610#issuecomment-1235710009
I described here what we need to do to support ARM devices.
Will you be able to help with it?
So basically run through what you did on the Jetson and that should compile an image?
I would like to help with this. Java programming background but happy with a bit of Python, Typescript etc. Only problem is: my Coral USB order is saying delivery 05/2023 :/ hoping they get some earlier stock!
I'd like to see it run off the Nano. I have Frigate using the Coral for Object detection. Would be nice to use all those cuda cores for facial recognition. For now I just have it running with Double Take in a VM on a PowerEdge R620 so it's just using xeon processor cores. Although I'm thinking about trying to find an old nvidia card go throw in there and see if that'll help.