DeepFaceLab icon indicating copy to clipboard operation
DeepFaceLab copied to clipboard

Performance: replace a slow numpy function

Open Cregrant opened this issue 2 years ago • 0 comments

This change will reduce the CPU load for GPU-bound tasks or the iteration time for CPU-bound tasks.

According to a doc, np.clip is an

Equivalent to but faster than np.minimum(a_max, np.maximum(a, a_min))

But https://github.com/numpy/numpy/issues/14281 and some simple tests show the opposite result. I did this check with a CPU-bound (i5-3570K, 4.4GHz) SAEHD 96 training on the pretain_faces dataset:

@echo off
call _internal\setenv.bat

python "%DFL_ROOT%\main.py" train ^
    --training-data-src-dir ".\_internal\pretrain_faces" ^
    --training-data-dst-dir ".\_internal\pretrain_faces" ^
    --pretraining-data-dir "%INTERNAL%\pretrain_faces" ^
    --model-dir "%WORKSPACE%\model" ^
    --model SAEHD

pause

And got this: np.clip() 143.4s per 300it = 478ms/it

np.clip() to a same array 142.8s per 300it = 476ms/it

np.minimum() + np.maximum() to a same array 119.5s per 300it = 398ms/it

arrays check CPU load reduced by 16%

Cregrant avatar Jan 31 '23 15:01 Cregrant