DeepFaceLab icon indicating copy to clipboard operation
DeepFaceLab copied to clipboard

Why there isn't an option for BatchNormalization?

Open Jerry-Master opened this issue 2 years ago • 18 comments

I have seen that you have included a BatchNorm2D layer in core.leras but haven't used it in any model. Why is so? Batch normalization can help reducing the number of iterations needed for convergence. Also, your current implementation is wrong, you don't update the mean and the variance. As the official tensorflow implementation states, you must do:

  • moving_mean = moving_mean * momentum + mean(batch) * (1 - momentum)
  • moving_var = moving_var * momentum + var(batch) * (1 - momentum)

Which you are not doing.

Jerry-Master avatar Dec 20 '22 14:12 Jerry-Master

I have seen that you have included a BatchNorm2D layer in core.leras but haven't used it in any model. Why is so? Batch normalization can help reducing the number of iterations needed for convergence. Also, your current implementation is wrong, you don't update the mean and the variance. As the official tensorflow implementation states, you must do:

  • moving_mean = moving_mean * momentum + mean(batch) * (1 - momentum)
  • moving_var = moving_var * momentum + var(batch) * (1 - momentum)

Which you are not doing.

Would you consider FORK and fix?

zabique avatar Jan 21 '23 17:01 zabique

Just made a PR here.

Jerry-Master avatar Feb 07 '23 10:02 Jerry-Master

Just made a PR here.

Thanks for improving DFL, I'm testing your PR.

zabique avatar Feb 07 '23 10:02 zabique

I have not included the layer in any model. If you want I can create a new archi adding the layers after every convolutional layer.

Jerry-Master avatar Feb 07 '23 11:02 Jerry-Master

It would be awesome to test in SAEHD LIAE-UDT. Maybe DFL fork wouldn't be bad idea. OP has ego problems I'm afraid.

zabique avatar Feb 07 '23 11:02 zabique

Okay, I'll try to implement and test it.

Jerry-Master avatar Feb 07 '23 11:02 Jerry-Master

I'm also happy to test model as soon as you got something ready.

zabique avatar Feb 07 '23 11:02 zabique

In my fork there is now support for Batch Normalization in any SAEHD architecture. When executing the program should ask 'Use BN' at some point. Just indicate 'y' and the new created model will have Batch Normalization on it. I have tested that nothing breaks with archi DF. I have not tested the loading of pretrained models. It would probably break for technical reasons, although adding BN is compatible with existing weights and should work. Right now I have just added support for newly created models.

Jerry-Master avatar Feb 07 '23 13:02 Jerry-Master

Aparently the loading of previous models without BN didn't give any problems as I thought. So the PR is free to go.

Jerry-Master avatar Feb 07 '23 13:02 Jerry-Master

I couldn't start old model

Press enter in 2 seconds to override model settings. [0] Autobackup every N hour ( 0..24 ?:help ) : 0 [y] Write preview history ( y/n ?:help ) : y [n] Choose image for the preview history ( y/n ) : n [0] Target iteration : 0 [n] Flip SRC faces randomly ( y/n ?:help ) : n [n] Flip DST faces randomly ( y/n ?:help ) : n [10] Batch_size ( ?:help ) : 10 [y] Masked training ( y/n ?:help ) : y [n] Eyes and mouth priority ( y/n ?:help ) : n [n] Uniform yaw distribution of samples ( y/n ?:help ) : n [y] Blur out mask ( y/n ?:help ) : y [y] Place models and optimizer on GPU ( y/n ?:help ) : y [n] Use AdaBelief optimizer? ( y/n ?:help ) : n [y] Use learning rate dropout ( n/y/cpu ?:help ) : y [n] Enable random warp of samples ( y/n ?:help ) : n [0.0] Random hue/saturation/light intensity ( 0.0 .. 0.3 ?:help ) : 0.0 [0.0] GAN power ( 0.0 .. 5.0 ?:help ) : 0.0 [0.0] Face style power ( 0.0..100.0 ?:help ) : 0.0 [0.0] Background style power ( 0.0..100.0 ?:help ) : 0.0 [none] Color transfer for src faceset ( none/rct/lct/mkl/idt/sot ?:help ) : none [n] Enable gradient clipping ( y/n ?:help ) : n [n] Enable pretraining mode ( y/n ?:help ) : n Error: 'use_bn' Traceback (most recent call last): File "G:\DeepFaceLabFREEZER_internal\DeepFaceLab\mainscripts\Trainer.py", line 58, in trainerThread debug=debug) File "G:\DeepFaceLabFREEZER_internal\DeepFaceLab\models\ModelBase.py", line 193, in init self.on_initialize() File "G:\DeepFaceLabFREEZER_internal\DeepFaceLab\models\Model_SAEHD\Model.py", line 276, in on_initialize if self.options['use_bn']: KeyError: 'use_bn'

zabique avatar Feb 07 '23 13:02 zabique

I got the exact same error, use the latest commit, it should be fixed.

Jerry-Master avatar Feb 07 '23 14:02 Jerry-Master

latest 7th commit worked

zabique avatar Feb 07 '23 14:02 zabique

actually not ....

[CPU] : CPU [0] : NVIDIA GeForce RTX 3090

[0] Which GPU indexes to choose? : 0

Press enter in 2 seconds to override model settings. [0] Autobackup every N hour ( 0..24 ?:help ) : 0 [y] Write preview history ( y/n ?:help ) : y [n] Choose image for the preview history ( y/n ) : n [0] Target iteration : 0 [n] Flip SRC faces randomly ( y/n ?:help ) : n [n] Flip DST faces randomly ( y/n ?:help ) : n [10] Batch_size ( ?:help ) : 6 6 [n] Use BN ( y/n ?:help ) : y [y] Masked training ( y/n ?:help ) : y [n] Eyes and mouth priority ( y/n ?:help ) : n [n] Uniform yaw distribution of samples ( y/n ?:help ) : n [y] Blur out mask ( y/n ?:help ) : y [y] Place models and optimizer on GPU ( y/n ?:help ) : y [n] Use AdaBelief optimizer? ( y/n ?:help ) : n [y] Use learning rate dropout ( n/y/cpu ?:help ) : y [n] Enable random warp of samples ( y/n ?:help ) : n [0.0] Random hue/saturation/light intensity ( 0.0 .. 0.3 ?:help ) : 0.0 [0.0] GAN power ( 0.0 .. 5.0 ?:help ) : 0.0 [0.0] Face style power ( 0.0..100.0 ?:help ) : 0.0 [0.0] Background style power ( 0.0..100.0 ?:help ) : 0.0 [none] Color transfer for src faceset ( none/rct/lct/mkl/idt/sot ?:help ) : none [n] Enable gradient clipping ( y/n ?:help ) : n [n] Enable pretraining mode ( y/n ?:help ) : n Initializing models: 40%|#########################2 | 2/5 [00:01<00:02, 1.02it/s] Error: Traceback (most recent call last): File "G:\DeepFaceLabFREEZER_internal\DeepFaceLab\mainscripts\Trainer.py", line 58, in trainerThread debug=debug) File "G:\DeepFaceLabFREEZER_internal\DeepFaceLab\models\ModelBase.py", line 193, in init self.on_initialize() File "G:\DeepFaceLabFREEZER_internal\DeepFaceLab\models\Model_SAEHD\Model.py", line 660, in on_initialize do_init = not model.load_weights( self.get_strpath_storage_for_file(filename) ) File "G:\DeepFaceLabFREEZER_internal\DeepFaceLab\core\leras\layers\Saveable.py", line 72, in load_weights d = pickle.loads(d_dumped) MemoryError

zabique avatar Feb 07 '23 14:02 zabique

That is new. MemoryError is typically given when loading large files. Try to create, save and load new models to see if the error persists.

Jerry-Master avatar Feb 07 '23 14:02 Jerry-Master

Okay, so apparently there was a problem because the current implementation of the DFL considers trainable and non-trainable weights the same. That was making the running_mean and running_var being trained which is wrong. I modified the SAEHD code so that they are not considered as trainable. I have tested it with a model (DF) I had already trained and everything is working fine. I cannot say if the BN is helping or not until I get involved in a real project.

Jerry-Master avatar Feb 07 '23 15:02 Jerry-Master

VRAM consumption gone trough the roof with BN. Unable to train at 400+ res and reasonable batch size on 24gb vram

zabique avatar Feb 08 '23 22:02 zabique

VRAM should increase, but not much. The mean and variance only have channel dimension. Maybe the gradient with respect to the co variance shift is at fault here. Although again, that should not happen, it is not such a big gradient. In other vision problems that I've used batch normalisation RAM usage wasn't an inconvenience. If the increase is huge there may be some problem with the way DFL is assigning back the moving values or with how broadcasting is done. All the relevant code is in the BN file, feel free to try some tricks to see if VRAM decreases. I would suggest you try removing the moving values and just using the mean and variance on the fly. If that still causes VRAM explosion, you could remove the co variance shift (although this has been proven to not work well). And if that still explodes then the problem is computing the mean and the variance.

An alternative to BN is instance normalization. Works well with generative models. It doesn't have moving values but it has covariant shift. Depending on what is causing the VRAM explosion it may reasonable to change BN to IN.

Jerry-Master avatar Feb 09 '23 06:02 Jerry-Master

I tried DF model 416 res at batch 4 was OOM where I can normally run heavier LIAE model 416 at batch 10 no issues. like u mentioned problem may be DFL itself, as far as I have seen porting to pytorch would enable 2x speed, 5x-10x lower VRAM footprint with new optimised model. Trying to fix DFL code is pure Necromancy.

zabique avatar Feb 09 '23 09:02 zabique