DeepFaceLab
DeepFaceLab copied to clipboard
Auto backup is skipped and a couple of errors.
The auto backup function worked when I first tried it, and I set it to 1h. However, when I start a new project, the auto backup option is set to 0 and it skips over the option to change it. Not sure why it happens and the only thing I can think of is that on one option (don't believe it was that one) I put in a number and then a question mark (e.g. 4?) on an earlier training (not the same project) because I was going to check the help info but didn't realize I had added a number.
I changed this line (Notepad: Ln 361, Col 9) in the file ModelBase.py under \DeepFaceLab_NVIDIA_internal\DeepFaceLab\models
if self.autobackup_hour != 1:
The training still skips the autobackup but it says it should backup every 1h, but it doesn't.
Neither of the training options have given errors but I upped the settings on SAEHD and I get this when I start the training and several times during the training as well:
E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 2.53G (2719914240 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
It happens randomly, but after 40K iterations there are no visible error. I get this error several times during merging SAEHD:
E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 2.50G (2680068352 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
But again, no visible errors.
Win 10, 64-bit, GTX 1660 with the latest Studio Drivers. DeepFaceLab_NVIDIA_build_02_03_2020 downloaded 14 02 2020.
maybe try this: in folder model:
- rename the file SAEHD_default_options.dat to SAEHD_default_options.bak.dat
- run a new model with the same settings of your current model - this time make sure that you enter the autobackup time right ;)
- let it run for a few iterations then quit
- then train again your current model
- this time with the "re-created" SAEHD_default_options.dat all should be fine
maybe try this: in folder model:
* rename the file _SAEHD_default_options.dat_ to _SAEHD_default_options.bak.dat_ * run a new model with the same settings of your current model - this time make sure that you enter the autobackup time right ;) * let it run for a few iterations then quit * then train again your current model * this time with the "re-created" SAEHD_default_options.dat all should be fine
Thanks for the reply! When I started the training and hit enter to change the settings, it showed the autobackup as having the vaule 1, but as I continued (not changing any options), it errors out:
Initializing models: 40%|#########################2 | 2/5 [00:00<00:00, 3.22it/s]Error: cannot reshape array of size 9437184 into shape (3,3,512,704) Traceback (most recent call last): File "E:\Programs\DeepFaceLab_NVIDIA_internal\DeepFaceLab\mainscripts\Trainer.py", line 57, in trainerThread debug=debug, File "E:\Programs\DeepFaceLab_NVIDIA_internal\DeepFaceLab\models\ModelBase.py", line 173, in init self.on_initialize() File "E:\Programs\DeepFaceLab_NVIDIA_internal\DeepFaceLab\models\Model_SAEHD\Model.py", line 692, in on_initialize do_init = not model.load_weights( self.get_strpath_storage_for_file(filename) ) File "E:\Programs\DeepFaceLab_NVIDIA_internal\DeepFaceLab\core\leras\layers.py", line 67, in load_weights w_val = np.reshape( w_val, w.shape.as_list() ) File "<array_function internals>", line 6, in reshape File "E:\Programs\DeepFaceLab_NVIDIA_internal\python-3.6.8\lib\site-packages\numpy\core\fromnumeric.py", line 301, in reshape return _wrapfunc(a, 'reshape', newshape, order=order) File "E:\Programs\DeepFaceLab_NVIDIA_internal\python-3.6.8\lib\site-packages\numpy\core\fromnumeric.py", line 61, in _wrapfunc return bound(*args, **kwds) ValueError: cannot reshape array of size 9437184 into shape (3,3,512,704)
So I just reverted back to the bak-file and it's running again (still with the errors in the original post).
I have 5 auto backups:
05 - 04 = 1h 10m Date modified.
03 - 01 = 1h 10m Date modified.
But between 03 - 04 there's a 8h 3m gap.
I'll let it run and see if it overwrites the backups in sequence.
Hmmm... did you really start a new model by entering a new model name? Then I would be surprised if you see the error message from above..
Alternatively, you might try to rename the entire model folder, lets say to model.bak, then create a new folder model and start a fresh training. In that case the file SAEHD_default_options.dat should be generated again indside the model folder. Once you have that, copy it into the model.bak folder, rename the bak folder to model and then run your training with the freshly generated SAEHD_default_options.dat. My feeling is that your file was/is somewhat corrupted.
And about the OOMs, well, thats a heavenly gift in the DeepFaceLab program :D
Hmmm... did you really start a new model by entering a new model name? Then I would be surprised if you see the error message from above..
And about the OOMs, well, thats a heavenly gift in the DeepFaceLab program :D
Yeah, just tried it. Also removed both files, started it up, chose a new name and it skipped option with the value set at 0. :/
This is going to be a stupid question, but I might as well ask, is the model size set at the beginning?
I switched over to a GTX 1660 just this weekend and before I was using a RX 560, the difference is night and day (everything was a fresh install). I never understood if the model size increased until took up all the memory or if the settings dictate how much of the "playing field" it occupates, so to speak.
With the RX 560, 1 iteration at lower settings took 8 - 12 seconds, now it's under 1 second at higher settings. Also, it crashed at around 4 - 6K on Quick96 while I discovered it was still going at around 200K on the 1660, at 0,209 seconds / iteration, and I didn't try to go any further,
The old file still works, and if all else fails later I can just unpack a fresh version of the app.
Did you ever find the answer? If so, would you mind sharing it and closing this issue?