Deep-Live-Cam Updates for metal / GPU and performance improves on Silicon Macs

Summary by Sourcery

Enhance performance on Silicon Macs by adding Metal support and updating default settings for video encoding and execution providers. Improve resource management and refactor code for better organization. Update documentation to reflect these changes.

New Features:

Introduce Metal support for improved performance on macOS devices, particularly Silicon Macs.

Enhancements:

Change default video encoder to 'libx265' and improve video quality by setting the default quality to 1.
Update default execution provider to 'coreml' for better performance on Apple devices.
Increase default maximum memory usage on macOS to 6GB and suggest 12 execution threads for better resource utilization.
Refactor image and video processing functions for better code organization and readability.
Improve webcam preview resolution to 1024x768 for better display quality.

Documentation:

Update README to reflect changes in execution provider usage, replacing 'coreml' with 'metal' for macOS devices.

Aug 13 '24 18:08 jasonkneen

Reviewer's Guide by Sourcery

This pull request implements several updates for metal / GPU and performance improvements on Silicon Macs. The changes focus on optimizing the execution providers, adjusting default settings, and improving compatibility with Apple Silicon devices. Key modifications include forcing the use of CoreML as the execution provider, updating video processing methods, and enhancing GPU utilization for TensorFlow and PyTorch on macOS.

File-Level Changes

Files	Changes
`modules/core.py`	Force CoreML as the execution provider and remove support for other providers
`modules/core.py`	Update default settings for video encoding, quality, and frame handling
`README.md`	Implement Metal support for improved performance on macOS devices
`modules/utilities.py`	Modify frame extraction process to use OpenCV instead of ffmpeg
`modules/ui.py`	Increase webcam preview resolution and adjust frame rate
`modules/processors/frame/face_swapper.py`	Update face swapper model to use non-fp16 version
`modules/core.py`	Add configuration and testing for CoreML, TensorFlow with Metal, and PyTorch with MPS

Tips

Trigger a new Sourcery review by commenting @sourcery-ai review on the pull request.
Continue your discussion with Sourcery by replying directly to review comments.
You can change your review settings at any time by accessing your dashboard:
- Enable or disable the Sourcery-generated pull request summary or reviewer's guide;
- Change the review language;
You can always contact us if you have any questions or feedback.

Aug 13 '24 18:08 sourcery-ai[bot]

The package customtkinter is still needed to launch the UI, otherwise:

python run.py --execution-provider coreml
Traceback (most recent call last):
  File "/Users/easto/deep-live-cam-tmp/Deep-Live-Cam/run.py", line 3, in <module>
    from modules import core
  File "/Users/easto/deep-live-cam-tmp/Deep-Live-Cam/modules/core.py", line 22, in <module>
    import modules.ui as ui
  File "/Users/easto/deep-live-cam-tmp/Deep-Live-Cam/modules/ui.py", line 3, in <module>
    import customtkinter as ctk
ModuleNotFoundError: No module named 'customtkinter'

Aug 14 '24 05:08 snacsnoc

I have tried your PR on the latest commit, on an M2 Pro, and still get very low FPS. I don't hear my fans kick in and my GPU usage is pretty low, do you think there is any way to improve it ?

Aug 14 '24 10:08 hvmzx

Checking activity monitor, this still uses the CPU only. Compared to nsfw-roop where it does use the GPU.

On Wed, Aug 14, 2024 at 03:13 hvmzx @.***> wrote:

I have tried your PR on the latest commit, on an M2 Pro, and still get very low FPS. I don't hear my fans kick in and my GPU usage is pretty low, do you think there is any way to improve it ?

— Reply to this email directly, view it on GitHub https://github.com/hacksider/Deep-Live-Cam/pull/295#issuecomment-2288369090, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAG436LKMRPJXGSTNP4PKH3ZRMUU7AVCNFSM6AAAAABMOZGEDOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOBYGM3DSMBZGA . You are receiving this because you commented.Message ID: @.***>

Aug 14 '24 17:08 snacsnoc

Checking activity monitor, this still uses the CPU only. Compared to nsfw-roop where it does use the GPU. … On Wed, Aug 14, 2024 at 03:13 hvmzx @.> wrote: I have tried your PR on the latest commit, on an M2 Pro, and still get very low FPS. I don't hear my fans kick in and my GPU usage is pretty low, do you think there is any way to improve it ? — Reply to this email directly, view it on GitHub <#295 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAG436LKMRPJXGSTNP4PKH3ZRMUU7AVCNFSM6AAAAABMOZGEDOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOBYGM3DSMBZGA . You are receiving this because you commented.Message ID: @.>

Getting the same on mac M1

Aug 19 '24 12:08 cdrage

@jasonkneen

Issue 1

Running fails since the detection sized has changed:

def get_face_analyser() -> Any:
    global FACE_ANALYSER

    if FACE_ANALYSER is None:
        FACE_ANALYSER = insightface.app.FaceAnalysis(name='buffalo_l', providers=modules.globals.execution_providers)
        FACE_ANALYSER.prepare(ctx_id=0, det_size=(1280, 720))
    return FACE_ANALYSER

Reverting back to

        FACE_ANALYSER.prepare(ctx_id=0, det_size=(640, 640))

works. Error:

(venv) [easto@MacBook-Pro][/tmp/Deep-Live-Cam]$ python run.py --execution-provider coreml  --execution-threads 12
Frame processor face_enhancer not found
Downloading: 56.0kB [00:00, 213kB/s]                                                                                                                                                                       
ONNX Runtime version: 1.16.3
Available execution providers: ['CoreMLExecutionProvider', 'CPUExecutionProvider']
Selected execution provider: CoreMLExecutionProvider (with CPU fallback for face detection)
TensorFlow devices: [PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
TensorFlow is using GPU (Metal)
PyTorch is using MPS (Metal Performance Shaders)
Frame processor face_enhancer not found
Applied providers: ['CoreMLExecutionProvider', 'CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}, 'CoreMLExecutionProvider': {}}
find model: /Users/easto/.insightface/models/buffalo_l/1k3d68.onnx landmark_3d_68 ['None', 3, 192, 192] 0.0 1.0
Applied providers: ['CoreMLExecutionProvider', 'CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}, 'CoreMLExecutionProvider': {}}
find model: /Users/easto/.insightface/models/buffalo_l/2d106det.onnx landmark_2d_106 ['None', 3, 192, 192] 0.0 1.0
Applied providers: ['CoreMLExecutionProvider', 'CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}, 'CoreMLExecutionProvider': {}}
find model: /Users/easto/.insightface/models/buffalo_l/det_10g.onnx detection [1, 3, '?', '?'] 127.5 128.0
Applied providers: ['CoreMLExecutionProvider', 'CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}, 'CoreMLExecutionProvider': {}}
find model: /Users/easto/.insightface/models/buffalo_l/genderage.onnx genderage ['None', 3, 96, 96] 0.0 1.0
Applied providers: ['CoreMLExecutionProvider', 'CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}, 'CoreMLExecutionProvider': {}}
find model: /Users/easto/.insightface/models/buffalo_l/w600k_r50.onnx recognition ['None', 3, 112, 112] 127.5 127.5
set det-size: (1280, 720)
2024-08-21 12:28:23.815556 [E:onnxruntime:, sequential_executor.cc:514 ExecuteKernel] Non-zero status code returned while running CoreML_9447659792891585317_6 node. Name:'CoreMLExecutionProvider_CoreML_9447659792891585317_6_6' Status Message: Exception: /Users/cansik/git/private/onnxruntime-silicon/onnxruntime/onnxruntime/core/providers/coreml/model/model.mm:63 InlinedVector<int64_t> (anonymous namespace)::GetStaticOutputShape(gsl::span<const int64_t>, gsl::span<const int64_t>, const logging::Logger &) inferred_shape.size() == coreml_static_shape.size() was false. CoreML static output shape ({1,1,1,7200,1}) and inferred shape ({3200,1}) have different ranks.

Exception in Tkinter callback
Traceback (most recent call last):
  File "/opt/homebrew/Cellar/[email protected]/3.11.9_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/tkinter/__init__.py", line 1967, in __call__
    return self.func(*args)
           ^^^^^^^^^^^^^^^^
  File "/private/tmp/Deep-Live-Cam/venv/lib/python3.11/site-packages/customtkinter/windows/widgets/ctk_button.py", line 554, in _clicked
    self._command()
  File "/private/tmp/Deep-Live-Cam/modules/ui.py", line 97, in <lambda>
    start_button = ctk.CTkButton(root, text='Start', cursor='hand2', command=lambda: select_output_path(start))
                                                                                     ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/private/tmp/Deep-Live-Cam/modules/ui.py", line 194, in select_output_path
    start()
  File "/private/tmp/Deep-Live-Cam/modules/core.py", line 135, in start
    if not frame_processor.pre_start():
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/private/tmp/Deep-Live-Cam/modules/processors/frame/face_swapper.py", line 28, in pre_start
    elif not get_one_face(cv2.imread(modules.globals.source_path)):
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/private/tmp/Deep-Live-Cam/modules/face_analyser.py", line 20, in get_one_face
    face = get_face_analyser().get(frame)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/private/tmp/Deep-Live-Cam/venv/lib/python3.11/site-packages/insightface/app/face_analysis.py", line 59, in get
    bboxes, kpss = self.det_model.detect(img,
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/private/tmp/Deep-Live-Cam/venv/lib/python3.11/site-packages/insightface/model_zoo/retinaface.py", line 224, in detect
    scores_list, bboxes_list, kpss_list = self.forward(det_img, self.det_thresh)
                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/private/tmp/Deep-Live-Cam/venv/lib/python3.11/site-packages/insightface/model_zoo/retinaface.py", line 152, in forward
    net_outs = self.session.run(self.output_names, {self.input_name : blob})
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/private/tmp/Deep-Live-Cam/venv/lib/python3.11/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 220, in run
    return self._sess.run(output_names, input_feed, run_options)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running CoreML_9447659792891585317_6 node. Name:'CoreMLExecutionProvider_CoreML_9447659792891585317_6_6' Status Message: Exception: /Users/cansik/git/private/onnxruntime-silicon/onnxruntime/onnxruntime/core/providers/coreml/model/model.mm:63 InlinedVector<int64_t> (anonymous namespace)::GetStaticOutputShape(gsl::span<const int64_t>, gsl::span<const int64_t>, const logging::Logger &) inferred_shape.size() == coreml_static_shape.size() was false. CoreML static output shape ({1,1,1,7200,1}) and inferred shape ({3200,1}) have different ranks.

Issue 2

Additionally, nsfw is missing from modules/globals:

Traceback (most recent call last):
  File "/private/tmp/Deep-Live-Cam/run.py", line 6, in <module>
    core.run()
  File "/private/tmp/Deep-Live-Cam/modules/core.py", line 247, in run
    start()
  File "/private/tmp/Deep-Live-Cam/modules/core.py", line 154, in start
    if modules.globals.nsfw == False:
       ^^^^^^^^^^^^^^^^^^^^
AttributeError: module 'modules.globals' has no attribute 'nsfw'

Adding nsfw = None fixes the error.

GPU Usage

Lastly, this still fails to use the GPU on Apple Silicon (MacBook Pro M2 Max):

Frame processor face_enhancer not found
ONNX Runtime version: 1.16.3
Available execution providers: ['CoreMLExecutionProvider', 'CPUExecutionProvider']
Selected execution provider: CoreMLExecutionProvider (with CPU fallback for face detection)
TensorFlow devices: [PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
TensorFlow is using GPU (Metal)
PyTorch is using MPS (Metal Performance Shaders)
Frame processor face_enhancer not found
Applied providers: ['CoreMLExecutionProvider', 'CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}, 'CoreMLExecutionProvider': {}}
find model: /Users/easto/.insightface/models/buffalo_l/1k3d68.onnx landmark_3d_68 ['None', 3, 192, 192] 0.0 1.0
Applied providers: ['CoreMLExecutionProvider', 'CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}, 'CoreMLExecutionProvider': {}}
find model: /Users/easto/.insightface/models/buffalo_l/2d106det.onnx landmark_2d_106 ['None', 3, 192, 192] 0.0 1.0
Applied providers: ['CoreMLExecutionProvider', 'CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}, 'CoreMLExecutionProvider': {}}
find model: /Users/easto/.insightface/models/buffalo_l/det_10g.onnx detection [1, 3, '?', '?'] 127.5 128.0
Applied providers: ['CoreMLExecutionProvider', 'CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}, 'CoreMLExecutionProvider': {}}
find model: /Users/easto/.insightface/models/buffalo_l/genderage.onnx genderage ['None', 3, 96, 96] 0.0 1.0
Applied providers: ['CoreMLExecutionProvider', 'CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}, 'CoreMLExecutionProvider': {}}
find model: /Users/easto/.insightface/models/buffalo_l/w600k_r50.onnx recognition ['None', 3, 112, 112] 127.5 127.5
set det-size: (640, 640)
[DLC.CORE] Creating temp resources...
[DLC.CORE] Extracting frames...
Frame processor face_enhancer not found
[DLC.FACE-SWAPPER] Progressing...
Processing:   0%|                                                                         | 0/1626 [00:00<?, ?frame/s, execution_providers=['CoreMLExecutionProvider'], execution_threads=12, max_memory=6]Applied providers: ['CoreMLExecutionProvider', 'CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}, 'CoreMLExecutionProvider': {}}
inswapper-shape: [1, 3, 128, 128]
Processing:  63%|███████████████████████████████████████                       | 1023/1626 [01:21<00:46, 12.97frame/s, execution_providers=['CoreMLExecutionProvider'], execution_threads=12, max_memory=6]

Aug 21 '24 19:08 snacsnoc

@hacksider this should not be merged yet as GPU support does not work for mac (still uses CPU)

Sep 19 '24 17:09 cdrage

Mine uses metal and the GPU.

Sep 19 '24 17:09 jasonkneen

Mine uses metal and the GPU.

Myself and the 5 other users in this PR all of us don't see GPU being used in the activity monitor and it's very slow. Even with your latest commit :(

Sep 20 '24 04:09 cdrage

Deep-Live-Cam Deep-Live-Cam copied to clipboard

Updates for metal / GPU and performance improves on Silicon Macs

Summary by Sourcery

Reviewer's Guide by Sourcery

File-Level Changes

Issue 1

Issue 2

GPU Usage

Deep-Live-Cam
Deep-Live-Cam copied to clipboard