Thank you very much for the great tutorial
my GPU is running out of memory and failing in the beginning, appreciate if you take a look at the error bellow
her is my error
`2020-05-20 11:18:44.904340: W tensorflow/core/common_runtime/bfc_allocator.cc:314] Allocator (GPU_0_bfc) ran out of memory trying to allocate 36.00MiB (rounded to 37748736). Current allocation summary follows.
2020-05-20 11:18:44.914558: W tensorflow/core/common_runtime/bfc_allocator.cc:319] **********************************************************************************************xx
2020-05-20 11:18:44.919358: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at transpose_op.cc:199 : Resource exhausted: OOM when allocating tensor with shape[3,3,512,4,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1356, in _do_call
return fn(*args)
File "C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1341, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1429, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[4,3,3,128,128] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node GPU0/G_loss/G/G_synthesis/128x128/Conv1/Square}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "run_training.py", line 192, in
main()
File "run_training.py", line 187, in main
run(**vars(args))
File "run_training.py", line 120, in run
dnnlib.submit_run(**kwargs)
File "C:\Users\USER6459\Documents\python\stylegan2\dnnlib\submission\submit.py", line 343, in submit_run
return farm.submit(submit_config, host_run_dir)
File "C:\Users\USER6459\Documents\python\stylegan2\dnnlib\submission\internal\local.py", line 22, in submit
return run_wrapper(submit_config)
File "C:\Users\USER6459\Documents\python\stylegan2\dnnlib\submission\submit.py", line 280, in run_wrapper
run_func_obj(**submit_config.run_func_kwargs)
File "C:\Users\USER6459\Documents\python\stylegan2\training\training_loop.py", line 299, in training_loop
tflib.run(G_train_op, feed_dict)
File "C:\Users\USER6459\Documents\python\stylegan2\dnnlib\tflib\tfutil.py", line 31, in run
return tf.get_default_session().run(*args, **kwargs)
File "C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 950, in run
run_metadata_ptr)
File "C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1173, in _run
feed_dict_tensor, options, run_metadata)
File "C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1350, in _do_run
run_metadata)
File "C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1370, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[4,3,3,128,128] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node GPU0/G_loss/G/G_synthesis/128x128/Conv1/Square (defined at C:\Users\USER6459\Documents\python\stylegan2\training\networks_stylegan2.py:104) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
Errors may have originated from an input operation.
Input Source operations connected to node GPU0/G_loss/G/G_synthesis/128x128/Conv1/Square:
GPU0/G_loss/G/G_synthesis/128x128/Conv1/mul_3 (defined at C:\Users\USER6459\Documents\python\stylegan2\training\networks_stylegan2.py:100)
Original stack trace for 'GPU0/G_loss/G/G_synthesis/128x128/Conv1/Square':
File "run_training.py", line 192, in
main()
File "run_training.py", line 187, in main
run(**vars(args))
File "run_training.py", line 120, in run
dnnlib.submit_run(**kwargs)
File "C:\Users\USER6459\Documents\python\stylegan2\dnnlib\submission\submit.py", line 343, in submit_run
return farm.submit(submit_config, host_run_dir)
File "C:\Users\USER6459\Documents\python\stylegan2\dnnlib\submission\internal\local.py", line 22, in submit
return run_wrapper(submit_config)
File "C:\Users\USER6459\Documents\python\stylegan2\dnnlib\submission\submit.py", line 280, in run_wrapper
run_func_obj(**submit_config.run_func_kwargs)
File "C:\Users\USER6459\Documents\python\stylegan2\training\training_loop.py", line 220, in training_loop
G_loss, G_reg = dnnlib.util.call_func_by_name(G=G_gpu, D=D_gpu, opt=G_opt, training_set=training_set, minibatch_size=minibatch_gpu_in, **G_loss_args)
File "C:\Users\USER6459\Documents\python\stylegan2\dnnlib\util.py", line 256, in call_func_by_name
return func_obj(*args, **kwargs)
File "C:\Users\USER6459\Documents\python\stylegan2\training\loss.py", line 152, in G_logistic_ns_pathreg
fake_images_out, fake_dlatents_out = G.get_output_for(latents, labels, is_training=True, return_dlatents=True)
File "C:\Users\USER6459\Documents\python\stylegan2\dnnlib\tflib\network.py", line 221, in get_output_for
out_expr = self._build_func(*final_inputs, **build_kwargs)
File "C:\Users\USER6459\Documents\python\stylegan2\training\networks_stylegan2.py", line 238, in G_main
images_out = components.synthesis.get_output_for(dlatents, is_training=is_training, force_clean_graph=is_template_graph, **kwargs)
File "C:\Users\USER6459\Documents\python\stylegan2\dnnlib\tflib\network.py", line 221, in get_output_for
out_expr = self._build_func(final_inputs, **build_kwargs)
File "C:\Users\USER6459\Documents\python\stylegan2\training\networks_stylegan2.py", line 498, in G_synthesis_stylegan2
x = block(x, res)
File "C:\Users\USER6459\Documents\python\stylegan2\training\networks_stylegan2.py", line 470, in block
x = layer(x, layer_idx=res2-4, fmaps=nf(res-1), kernel=3)
File "C:\Users\USER6459\Documents\python\stylegan2\training\networks_stylegan2.py", line 455, in layer
x = modulated_conv2d_layer(x, dlatents_in[:, layer_idx], fmaps=fmaps, kernel=kernel, up=up, resample_kernel=resample_kernel, fused_modconv=fused_modconv)
File "C:\Users\USER6459\Documents\python\stylegan2\training\networks_stylegan2.py", line 104, in modulated_conv2d_layer
d = tf.rsqrt(tf.reduce_sum(tf.square(ww), axis=[1,2,3]) + 1e-8) # [BO] Scaling factor.
File "C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\ops\gen_math_ops.py", line 10698, in square
"Square", x=x, name=name)
File "C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\util\deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\framework\ops.py", line 3616, in create_op
op_def=op_def)
File "C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\framework\ops.py", line 2005, in init
self._traceback = tf_stack.extract_stack()`
I use workstation RTX 2080 Ti 11GB X1 and Ram is 128GM
CUDA version is 10.0
and even if I reduce batch size it still fails due to space lack
my dataset is 500 pictures 1024x1024
i tried bigger data set and smaller dataset and tried png, jpg,
I tried all config avilable (a , b , c ,d ,e , f)
I tried running it on another computer with GTX 1060 and cuda fails
I tried 512 by 512 image dataset
I tried 256 x 256 image dataset
< ERROR CUDA RUN OUT OF MEMORY >
please note, I am trying to build my own models on building shapes not human faces
thank you in advance