DeepFaceLab icon indicating copy to clipboard operation
DeepFaceLab copied to clipboard

TRAINING ERROR AT SAE

Open naruto46 opened this issue 3 years ago • 1 comments

WHILE STARTING TRAINING THIS ERROR OCCURING, SOMEONE PLEASE HELP ME TO RESOLVE THIS

Error: OOM when allocating tensor with shape[3,3,512,2016] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[{{node mul_23}} = Mul[T=DT_FLOAT, _class=["loc:@cond_14/Switch_1"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](truediv_34, gradients/model_2/conv2d_6/convolution_grad/Conv2DBackpropFilter)]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Traceback (most recent call last): File "D:\soft\Programs\DeepFaceLab_CUDA_9.2_SSE_internal\DeepFaceLab\mainscripts\Trainer.py", line 109, in trainerThread iter, iter_time = model.train_one_iter() File "D:\soft\Programs\DeepFaceLab_CUDA_9.2_SSE_internal\DeepFaceLab\models\ModelBase.py", line 525, in train_one_iter losses = self.onTrainOneIter(sample, self.generator_list) File "D:\soft\Programs\DeepFaceLab_CUDA_9.2_SSE_internal\DeepFaceLab\models\Model_SAE\Model.py", line 509, in onTrainOneIter src_loss, dst_loss, = self.src_dst_train (feed) File "D:\soft\Programs\DeepFaceLab_CUDA_9.2_SSE_internal\python-3.6.8\lib\site-packages\keras\backend\tensorflow_backend.py", line 2715, in call return self._call(inputs) File "D:\soft\Programs\DeepFaceLab_CUDA_9.2_SSE_internal\python-3.6.8\lib\site-packages\keras\backend\tensorflow_backend.py", line 2675, in _call fetched = self._callable_fn(*array_vals) File "D:\soft\Programs\DeepFaceLab_CUDA_9.2_SSE_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1439, in call run_metadata_ptr) File "D:\soft\Programs\DeepFaceLab_CUDA_9.2_SSE_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 528, in exit c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[3,3,512,2016] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[{{node mul_23}} = Mul[T=DT_FLOAT, _class=["loc:@cond_14/Switch_1"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](truediv_34, gradients/model_2/conv2d_6/convolution_grad/Conv2DBackpropFilter)]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

naruto46 avatar Nov 17 '22 18:11 naruto46

OOM means "out of memory" lower your settings

PH2100 avatar Jan 29 '23 15:01 PH2100

Issue solved / already answered (or it seems like user error), please close it.

joolstorrentecalo avatar Jun 08 '23 23:06 joolstorrentecalo