DeepFaceLab
DeepFaceLab copied to clipboard
Performance: replace a slow numpy function
This change will reduce the CPU load for GPU-bound tasks or the iteration time for CPU-bound tasks.
According to a doc, np.clip is an
Equivalent to but faster than np.minimum(a_max, np.maximum(a, a_min))
But https://github.com/numpy/numpy/issues/14281 and some simple tests show the opposite result. I did this check with a CPU-bound (i5-3570K, 4.4GHz) SAEHD 96 training on the pretain_faces dataset:
@echo off
call _internal\setenv.bat
python "%DFL_ROOT%\main.py" train ^
--training-data-src-dir ".\_internal\pretrain_faces" ^
--training-data-dst-dir ".\_internal\pretrain_faces" ^
--pretraining-data-dir "%INTERNAL%\pretrain_faces" ^
--model-dir "%WORKSPACE%\model" ^
--model SAEHD
pause
And got this: np.clip() 143.4s per 300it = 478ms/it
np.clip() to a same array 142.8s per 300it = 476ms/it
np.minimum() + np.maximum() to a same array 119.5s per 300it = 398ms/it
arrays check CPU load reduced by 16%