CompreFace icon indicating copy to clipboard operation
CompreFace copied to clipboard

Google Coral support

Open muebau opened this issue 3 years ago • 47 comments

ML part to Google Coral accelerator If the Google Coral USB accelerator (Google Edge TPU) could be used for CompreFace it would offload the ML part. It could possibly run also on ARM as there are Docker Images (for ARM too) to support the Google Edge TPU.

I got it running for another project (https://github.com/matiasdelellis/facerecognition) as a simple solution to provide a version with modularization (https://github.com/matiasdelellis/facerecognition/issues/210) in mind to make it run on eg. RaspberryPi 4.

I think this project could be used in the way I did with my simple docker container. The Nextcloud facerecognition app could use CompreFace as a engine for the ML part and with ML accelerators (like Google Coral or others) it would offer enough ML power to be usable on "home Nextcloud Clouds" in your closet 😉.

muebau avatar Jun 23 '21 11:06 muebau

I totally agree with you that support of Google Coral accelerator would be great. But I described all the problems here: https://github.com/exadel-inc/CompreFace/issues/519#issuecomment-848057021

pospielov avatar Jun 23 '21 20:06 pospielov

With the PR https://github.com/exadel-inc/CompreFace/pull/580 for Coral support, is this looking any more possible?

I tried looking through the contents of the PR but couldn't actually tell if it's possible to start using without further code changes.

@pospielov or @iamrinni , can you comment?

iamacarpet avatar Nov 20 '21 20:11 iamacarpet

Irina did a great job and managed to run compreface-core functionality on Google Coral. The problem she faced is putting this functionality into a docker container. There are instructions on how to use Coral on Linux, but she has MacOS. Unfortunately, she left the project, and this task is stuck. I don't have time for it for now. Probably I'll return to it next year or if we find another contributor with Google Coral

pospielov avatar Nov 21 '21 09:11 pospielov

Thanks @pospielov ,

Can you offer any advice on how to get moving with it?

My python skills are pretty much nonexistent, but if it’s just wrapping everything up in Docker, I might be able to help.

The PR that is pending seems to have Docker images included… Did these not work on testing?

Haven’t properly gotten to grips with the architecture to know how everything hangs together yet, but is the ideal that the only thing that needs replacing is the container(s) running TensorFlow and then it should “just work”?

I understand that might be a time consuming question to answer, only if you have time and any hits in the right direction would be appreciated.

iamacarpet avatar Nov 21 '21 11:11 iamacarpet

https://github.com/exadel-inc/CompreFace/blob/EFRS-1114/embedding-calculator/tpu.Dockerfile.full here is the new docker file. As I understand in Irina's environment it didn't see the Google Coral. Probably the problem is with MacOS If you have Linux, you can try it out - probably it already works. Here is a doc with example of using Google Coral with docker: https://medium.com/star-gazers/running-tensorflow-lite-at-the-edge-with-raspberry-pi-google-coral-and-docker-83d01357469c I think to build it you can use this command: docker build -t embedding-calculator --build-arg SKIP_TESTS=true . to run, this: docker run -p 3000:3000 --privileged -v /dev/bus/usb:/dev/bus/usb embedding-calculator Then if everything ok in logs, you can open swagger in the browser: http://localhost:3300/apidocs and then invoke /find_faces request. If everything ok - then we can try to put it into docker-compose and check the whole system

pospielov avatar Nov 22 '21 08:11 pospielov

Sorry I'm jumping onboard this issue, but I wanted to share my progress:

So I tried running this in my current Test env, and ran into some issues. (Unraid os, which is a stripped linux version) Fixed most of them, but I couldn't fix some of them(mainly python related, and because of the stripped OS) So I'm currently spinning up a Ubuntun20.04 VM in my lab to see if I can get this working. I have a spare Coral that I'm going to use for testing. If I make a succesfull build + running the Swagger stuff I'll let you know.

patatman avatar Nov 22 '21 15:11 patatman

Small update: Container building went successful, but I can't seem to run it.

[uWSGI] getting INI configuration from uwsgi.ini
*** Starting uWSGI 2.0.19 (64bit) on [Mon Nov 22 16:42:25 2021] ***
compiled with version: 8.3.0 on 22 November 2021 16:30:18
os: Linux-5.4.0-90-generic #101-Ubuntu SMP Fri Oct 15 20:00:55 UTC 2021
nodename: 16e81a2d3def
machine: x86_64
clock source: unix
detected number of CPU cores: 3
current working directory: /app/ml
detected binary path: /usr/local/bin/uwsgi
!!! no internal routing support, rebuild with pcre support !!!
setgid() to 33
setuid() to 33
your memory page size is 4096 bytes
detected max file descriptor number: 1048576
lock engine: pthread robust mutexes
thunder lock: disabled (you can enable it with --thunder-lock)
uwsgi socket 0 bound to TCP address 0.0.0.0:3000 fd 3
Python version: 3.7.3 (default, Jan 22 2021, 20:04:44)  [GCC 8.3.0]
Python main interpreter initialized at 0x55f50b2fd050
python threads support enabled
your server socket listen backlog is limited to 100 connections
your mercy for graceful operations on workers is 60 seconds
mapped 218712 bytes (213 KB) for 2 cores
*** Operational MODE: preforking ***
Traceback (most recent call last):
  File "./src/app.py", line 25, in <module>
    from src.init_runtime import init_runtime
  File "./src/init_runtime.py", line 21, in <module>
    from src._logging import init_logging
  File "./src/_logging.py", line 22, in <module>
    from yaml import YAMLLoadWarning
ImportError: cannot import name 'YAMLLoadWarning' from 'yaml' (/usr/local/lib/python3.7/dist-packages/yaml/__init__.py)
unable to load app 0 (mountpoint='') (callable not found or import error)
*** no app loaded. GAME OVER ***

I noticed the branch from Irina was behind the Master, so I did a merge from master and now it's building again. Unfortunately the build takes quite long, so I'm just waiting on that to see if that fixes the issue.

patatman avatar Nov 22 '21 16:11 patatman

https://github.com/exadel-inc/CompreFace/commit/fe08d90abab5cabd6829c684c6bd39a8ea86d196#diff-462581714c2d689beb979af2fa29b0c9122382efafbe9133b4319d79c1c8d6e8 This build problem could be fixed by adding PyYAML==5.4.1 to requirements.txt file So yes - if you merge master into this branch it should fix this problem. But it could appear other problems, who knows. So I would recommend fixing it directly in requirements.txt file, and merging master only after make sure that everything works

pospielov avatar Nov 22 '21 17:11 pospielov

Last update: Got the container running, the merging from the master seems to have done the trick. I'm currently trying to scan a image using the Swagger ui, but I'm getting the following error:

{"severity": "CRITICAL", "message": "RuntimeError: Node number 3 (CONV_2D) failed to invoke.\n", "request": {"method": "POST", "path": "/find_faces?limit=0", "filename": "download.jpeg", "api_key": "", "remote_addr": "192.168.1.174"}, "logger": "src.services.flask_.error_handling", "module": "error_handling", "traceback": "Traceback (most recent call last):\n  File \"/usr/local/lib/python3.7/dist-packages/flask/app.py\", line 1950, in full_dispatch_request\n    rv = self.dispatch_request()\n  File \"/usr/local/lib/python3.7/dist-packages/flask/app.py\", line 1936, in dispatch_request\n    return self.view_functions[rule.endpoint](**req.view_args)\n  File \"./src/services/flask_/needs_attached_file.py\", line 32, in wrapper\n    return f(*args, **kwargs)\n  File \"./src/_endpoints.py\", line 72, in find_faces_post\n    face_plugins=face_plugins\n  File \"./src/services/facescan/plugins/mixins.py\", line 44, in __call__\n    faces = self._fetch_faces(img, det_prob_threshold)\n  File \"./src/services/facescan/plugins/mixins.py\", line 51, in _fetch_faces\n    boxes = self.find_faces(img, det_prob_threshold)\n  File \"./src/services/facescan/plugins/facenet/coralmtcnn/coralmtcnn.py\", line 78, in find_faces\n    detect_face_result = fdn.detect_faces(img)\n  File \"/usr/local/lib/python3.7/dist-packages/mtcnn_tflite/MTCNN.py\", line 308, in detect_faces\n    result = stage(img, result[0], result[1])\n  File \"/usr/local/lib/python3.7/dist-packages/mtcnn_tflite/MTCNN.py\", line 357, in __stage1\n    pnetlite.invoke()\n  File \"/usr/local/lib/python3.7/dist-packages/tensorflow_core/lite/python/interpreter.py\", line 493, in invoke\n    self._interpreter.Invoke()\n  File \"/usr/local/lib/python3.7/dist-packages/tensorflow_core/lite/python/interpreter_wrapper/tensorflow_wrap_interpreter_wrapper.py\", line 113, in Invoke\n    return _tensorflow_wrap_interpreter_wrapper.InterpreterWrapper_Invoke(self)\nRuntimeError: Node number 3 (CONV_2D) failed to invoke.\n\n", "build_version": "dev"}
2021-11-22 17:10:35.050104: E tensorflow/stream_executor/cuda/cuda_driver.cc:351] failed call to cuInit: UNKNOWN ERROR (303)

I currently don't have time to debug any further, will continue a bit later. Maybe someone can continue based on what I provided. I pushed my changed to my own fork here: https://github.com/patatman/CompreFace/tree/EFRS-1114

patatman avatar Nov 22 '21 17:11 patatman

I continued today a bit, and I ruled out some issues:

  • The USB is properly passed to the container
  • Google Coral is working, tested this with the Google Example image

It really look like this part is trowing the error:

2021-11-23 20:09:37.524992: E tensorflow/stream_executor/cuda/cuda_driver.cc:351] failed call to cuInit: UNKNOWN ERROR (303)

I'm not sure why though, and I'm not a programmer. If someone with better understanding of Python or this package is willing to help, that would be awesome! I'm open to do a pair-programming session to debug/solve this issue.

patatman avatar Nov 23 '21 20:11 patatman

CUDA is the name of nVidia’s GPGPU cores, and from the file names, it looks like it is using TensorFlow, rather than TensorFlow Lite (required for Coral).

In the PR’s main comment block, she referenced a function where you have to pass in the parameter of “TPU”, but I didn’t understand where that referenced.

iamacarpet avatar Nov 24 '21 09:11 iamacarpet

I had a closer look at the PR, but even after changing the code to:

    def _calculate_embeddings(self, cropped_images, mode='CPU'):
        """Run forward pass to calculate embeddings"""
        if mode == 'TPU':
            calc_model = self._embedding_calculator_tpu
        else:
            calc_model = self._embedding_calculator_tpu
            # cropped_images = [prewhiten(img).astype(np.float32) for img in cropped_images]

Basicly forcing it to always use the _embedding_calculator_tpu part(If I'm correct). It still gives the same error.

patatman avatar Nov 24 '21 11:11 patatman

The error below doesn't seem Google Coral related, it also generates this error if I use the CPU.

2021-11-23 20:09:37.524992: E tensorflow/stream_executor/cuda/cuda_driver.cc:351] failed call to cuInit: UNKNOWN ERROR (303)

I've debugged a ton, and currently I'm stuck on this error:

{
  "message": "RuntimeError: Node number 3 (CONV_2D) failed to invoke.\n"
}

As soon as I try to invoke the find_faces api it generates this error. The Status api works as expected, and the program doesn't crash. It's just that specific request generates an error. If I look into that error, some people are saying it's related to the model used. I'm not familiar enough to debug this any further.

Anyone have any ideas to continue? Currently stuck

patatman avatar Nov 24 '21 13:11 patatman

Kind of a strange error. I tried to build by myself. First - I took the clear branch and fixed only requirements.txt as I mentioned. Then build with: docker build -t embedding-calculator --build-arg SKIP_TESTS=true -f tpu.Dockerfile . Then run with and without Google Coral : docker run -it --name=test -p 3000:3000 embedding-calculator It worked normally. I mean according to code by default it will use CPU, not TPU.

Then I changed the file coralmtcnn.py the same way as you did. Built and run with this command (and with Google Coral): docker run -it --privileged -v /dev/bus/usb:/dev/bus/usb --name=test -p 3000:3000 embedding-calculator When I tried to run the find_faces endpoint I got this error:

Traceback (most recent call last):
    File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 1950, in full_dispatch_request
    rv = self.dispatch_request()
    File "/usr/local/lib/python3.7/site-packages/flask/app.py", line 1936, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
    File "./src/services/flask_/needs_attached_file.py", line 32, in wrapper
    return f(*args, **kwargs)
    File "./src/_endpoints.py", line 72, in find_faces_post
    face_plugins=face_plugins
    File "./src/services/facescan/plugins/mixins.py", line 46, in __call__
    self._apply_face_plugins(face, face_plugins)
    File "./src/services/facescan/plugins/mixins.py", line 67, in _apply_face_plugins
    raise exceptions.PluginError(f'{plugin} error - {e}')
    src.services.facescan.plugins.exceptions.PluginError: coralmtcnn.Calculator error - Failed to load delegate from libedgetpu.so.1

Looks like the problem with a library. I also got this warning:

{"severity": "DEBUG", "message": "Falling back to TensorFlow client; we recommended you install the Cloud TPU client directly with pip install cloud-tpu-client.", 

I can't say where is the problem, I need to dig deeper to fix it.

pospielov avatar Nov 24 '21 17:11 pospielov

So I tested again with the clean branch like you mentioned above.

  1. Adjust requirements.txt
  2. Build and run successful
  3. adjust embedding-calculator/src/services/facescan/plugins/facenet/coralmtcnn/coralmtcnn.py to ONLY use Coral
  4. Build and run Successful!
  5. remove Coral from USB (while container is running)
  6. Still works? So it wasn't using the Coral after all.
  7. check code inside container, to make sure it's running the version where both are using self._embedding_calculator_tpu
  8. What is happening? haha I'm doing some more tests tomorrow.

Your error Failed to load delegate from libedgetpu.so.1 suggests it can't find the Coral. Make sure you have the coral plugged in before you run the container. I was trying to replicate this when I was running step 5 haha.

patatman avatar Nov 24 '21 18:11 patatman

Do you have the same error as me if you plug out Coral?

pospielov avatar Nov 25 '21 16:11 pospielov

No, I can't seem to get the code to run of the Coral. It doesn't matter if I have it plugged in, and with the code adjustment to force is. It will always use the CPU instead of coral.

patatman avatar Nov 26 '21 11:11 patatman

I can gladly be part of testing on my RPI4+Google Coral running Frigate today

beetlezap avatar Jan 31 '22 08:01 beetlezap

Any updates on support for Google Coral?

isaacolsen94 avatar Jul 18 '22 00:07 isaacolsen94

Unfortunately no, we don't have contributors with Google Coral now

pospielov avatar Jul 18 '22 14:07 pospielov

I know a bit of python and have a raspberry pi 4 8gb 64bit with a coral ai what is the latest code with coral support. I would be happy to help.

archef2000 avatar Aug 20 '22 13:08 archef2000

First, this would be great! Second, it's not so simple

There are two problems to solve:

  1. Add Google Coral support
  2. Build it for arm devices

What I suggest is first try to build CompreFace with Google Coral support and then try to build it for arm devices.

We have a branch https://github.com/exadel-inc/CompreFace/tree/EFRS-1114, which is quite old, but it's a good idea to start from it. Could you try if it builds and works with Google Coral?

There are two docker files: https://github.com/exadel-inc/CompreFace/blob/EFRS-1114/embedding-calculator/tpu.Dockerfile https://github.com/exadel-inc/CompreFace/blob/EFRS-1114/embedding-calculator/tpu.Dockerfile.full Not sure what is the difference, but it's good to start researching from them

pospielov avatar Aug 23 '22 09:08 pospielov

Are these Dockerfiles ARM64 compatible?

archef2000 avatar Aug 23 '22 10:08 archef2000

No, we don't have ARM-compatible dockerfiles. This is another challenge

pospielov avatar Aug 25 '22 20:08 pospielov

I have all that and a Jetson Nano along with a test VMWare environment running on a Dell PowerEdge R620. Although I'm no programmer I do have 26 year of IT background. Mainly networking

LordNex avatar Aug 31 '22 07:08 LordNex

https://github.com/exadel-inc/CompreFace/issues/610#issuecomment-1235710009 I described here what we need to do to support ARM devices. Will you be able to help with it?

pospielov avatar Sep 02 '22 16:09 pospielov

#610 (comment) I described here what we need to do to support ARM devices. Will you be able to help with it?

I would love to.

archef2000 avatar Sep 03 '22 15:09 archef2000

https://github.com/exadel-inc/CompreFace/issues/610#issuecomment-1235710009

I described here what we need to do to support ARM devices.

Will you be able to help with it?

So basically run through what you did on the Jetson and that should compile an image?

LordNex avatar Sep 03 '22 16:09 LordNex

I would like to help with this. Java programming background but happy with a bit of Python, Typescript etc. Only problem is: my Coral USB order is saying delivery 05/2023 :/ hoping they get some earlier stock!

leccelecce avatar Dec 22 '22 11:12 leccelecce

I'd like to see it run off the Nano. I have Frigate using the Coral for Object detection. Would be nice to use all those cuda cores for facial recognition. For now I just have it running with Double Take in a VM on a PowerEdge R620 so it's just using xeon processor cores. Although I'm thinking about trying to find an old nvidia card go throw in there and see if that'll help.

LordNex avatar Jan 22 '23 21:01 LordNex