face_recognition icon indicating copy to clipboard operation
face_recognition copied to clipboard

Method to release dlib resources to manage GPU resources

Open pliablepixels opened this issue 6 years ago • 8 comments
trafficstars

(relates to #868 #722)

  • face_recognition version: 1.2.3 (face_recognition.__version__)
  • Python version: 3.6.8
  • Operating System: Ubuntu Xenial 18.04

Description of problem

Dlib cleanly removes its memory consumption when necessary objects are deleted (or go out of scope). However, face_recognition instantiates module variables that get instantiated during import (here and here) which make it hard to delete for long running apps to be able to conserve GPU memory. The two biggest memory consumers are cnn_face_detection_model and face_encoder which keep persistent in memory and if we are implementing a web server similar to your example here, we run out of memory very quickly after a few API calls.

To solve this and manage memory, I have this current workaround implemented in my code:

import dlib
import face_recognition

# In my class where I use face_recognition

def clean_dlib(self):
        del face_recognition.api.cnn_face_detector 
        del face_recognition.api.face_encoder 

def init_dlib(self):
        face_recognition.api.cnn_face_detector = dlib.cnn_face_detection_model_v1(face_recognition.api.cnn_face_detection_model)
        face_recognition.api.face_encoder = dlib.face_recognition_model_v1(face_recognition.api.face_recognition_model)

I call self.init_dlib() before I use your methods and self.clean_dlib() right after, like so:

self.init_dlib()
face_locations = face_recognition.face_locations(...)
face_encodings = face_recognition.face_encodings(...)
<do the rest of face_comparison etc>
self.clean_dlib()

Obviously, this results in a speed decrease because the model and encodings get reloaded, but it lets me manage memory and not run out just after a few calls (I have a 1050Ti 4GB)

Ask

I was wondering if you have an alternate suggestion or, maybe, would consider a clean API to clear resources?

pliablepixels avatar Aug 05 '19 16:08 pliablepixels

That is a good idea. I'll think about doing something in the future, but PRs are welcome if anyone else does it first. As a (lame) workaround, you could always import face_recognition inside a function to make it go out of scope when the function ends.

That being said, you shouldn't run out of GPU memory after a few calls unless something weird is happening. Maybe it is because of the way you are calling the library from inside a class and hanging on to a duplicate reference there or something? You don't need to embed the face recognition library inside a class.

ageitgey avatar Aug 20 '19 12:08 ageitgey

Hi, I am not really doing anything special here. The situation is also replicable in your web service example as #722 points out.

pliablepixels avatar Aug 20 '19 13:08 pliablepixels

Did you find any other solution regarding this memory issue?

mohitwadhwa2 avatar Mar 23 '21 06:03 mohitwadhwa2

In order to debug, you could use app.run(threaded=False), the memory of GPU will be released.

epicchen avatar Mar 29 '21 13:03 epicchen

hi, I'm having the same issue, and I find it quite annoying that the only fix available is this workaround. It's important for me to have a service running 24/7 without having problems with this, and would be logical to the implementation to not to leak memory in a continous operation without deserializing the model from disk again and again. Is there any updates on this? is this a dlib or a face_recognition problem? thanks in advance.

flariut avatar Sep 07 '21 17:09 flariut

@flariut , at the product server, you should use something like Nginx. It will manage multi-thread and release memory by itself.

epicchen avatar Oct 09 '21 11:10 epicchen

I have memory problems too

corrupted size vs. prev_size while consolidating malloc(): invalid next size (unsorted) free(): corrupted unsorted chunks What is the reason for these errors?

Bah1996 avatar Dec 12 '22 13:12 Bah1996

updating in my issue, for me it was resolved changing the way we manage threads in our application. apparently, this is a CUDA issue with threading, not a dlib or a face_recognition problem. if you work spawning and killing threads, even if you free and delete all references to it and it's objects, the CUDA backend leaks a bit of memory. the solution is to use a pool of worker threads. https://github.com/davisking/dlib/issues/1381#issuecomment-599146089

flariut avatar May 11 '23 17:05 flariut