idinvert icon indicating copy to clipboard operation
idinvert copied to clipboard

tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed

Open MHX1203 opened this issue 2 years ago • 3 comments

Following is the training log.

dnnlib: Running training.training_loop.training_loop() on localhost...
GPU available:  True
GPU devices:  /device:GPU:0
>>>>> Create Session
Dataset directory:  .
Streaming data using training.dataset.TFRecordDataset...
tfrecord_dir:  .\custom-images
Dataset shape = [1, 512, 512]
Dynamic range = [0, 255]
Label size    = 0
Constructing networks...

G                             Params    OutputShape         WeightShape     
---                           ---       ---                 ---             
latents_in                    -         (?, 512)            -               
labels_in                     -         (?, 0)              -               
lod                           -         ()                  -               
dlatent_avg                   -         (512,)              -               
G_mapping/latents_in          -         (?, 512)            -               
G_mapping/labels_in           -         (?, 0)              -               
G_mapping/PixelNorm           -         (?, 512)            -               
G_mapping/Dense0              262656    (?, 512)            (512, 512)      
G_mapping/Dense1              262656    (?, 512)            (512, 512)      
G_mapping/Dense2              262656    (?, 512)            (512, 512)      
G_mapping/Dense3              262656    (?, 512)            (512, 512)      
G_mapping/Dense4              262656    (?, 512)            (512, 512)      
G_mapping/Dense5              262656    (?, 512)            (512, 512)      
G_mapping/Dense6              262656    (?, 512)            (512, 512)      
G_mapping/Dense7              4202496   (?, 8192)           (512, 8192)     
G_mapping/Reshape             -         (?, 16, 512)        -               
G_mapping/dlatents_out        -         (?, 16, 512)        -               
Truncation                    -         (?, 16, 512)        -               
G_synthesis/dlatents_in       -         (?, 16, 512)        -               
G_synthesis/4x4/Const         534528    (?, 512, 4, 4)      (512,)          
G_synthesis/4x4/Conv          2885632   (?, 512, 4, 4)      (3, 3, 512, 512)
G_synthesis/ToRGB_lod7        513       (?, 1, 4, 4)        (1, 1, 512, 1)  
G_synthesis/8x8/Conv0_up      2885632   (?, 512, 8, 8)      (3, 3, 512, 512)
G_synthesis/8x8/Conv1         2885632   (?, 512, 8, 8)      (3, 3, 512, 512)
G_synthesis/ToRGB_lod6        513       (?, 1, 8, 8)        (1, 1, 512, 1)  
G_synthesis/Upscale2D         -         (?, 1, 8, 8)        -               
G_synthesis/Grow_lod6         -         (?, 1, 8, 8)        -               
G_synthesis/16x16/Conv0_up    2885632   (?, 512, 16, 16)    (3, 3, 512, 512)
G_synthesis/16x16/Conv1       2885632   (?, 512, 16, 16)    (3, 3, 512, 512)
G_synthesis/ToRGB_lod5        513       (?, 1, 16, 16)      (1, 1, 512, 1)  
G_synthesis/Upscale2D_1       -         (?, 1, 16, 16)      -               
G_synthesis/Grow_lod5         -         (?, 1, 16, 16)      -               
G_synthesis/32x32/Conv0_up    2885632   (?, 512, 32, 32)    (3, 3, 512, 512)
G_synthesis/32x32/Conv1       2885632   (?, 512, 32, 32)    (3, 3, 512, 512)
G_synthesis/ToRGB_lod4        513       (?, 1, 32, 32)      (1, 1, 512, 1)  
G_synthesis/Upscale2D_2       -         (?, 1, 32, 32)      -               
G_synthesis/Grow_lod4         -         (?, 1, 32, 32)      -               
G_synthesis/64x64/Conv0_up    1442816   (?, 256, 64, 64)    (3, 3, 512, 256)
G_synthesis/64x64/Conv1       852992    (?, 256, 64, 64)    (3, 3, 256, 256)
G_synthesis/ToRGB_lod3        257       (?, 1, 64, 64)      (1, 1, 256, 1)  
G_synthesis/Upscale2D_3       -         (?, 1, 64, 64)      -               
G_synthesis/Grow_lod3         -         (?, 1, 64, 64)      -               
G_synthesis/128x128/Conv0_up  426496    (?, 128, 128, 128)  (3, 3, 256, 128)
G_synthesis/128x128/Conv1     279040    (?, 128, 128, 128)  (3, 3, 128, 128)
G_synthesis/ToRGB_lod2        129       (?, 1, 128, 128)    (1, 1, 128, 1)  
G_synthesis/Upscale2D_4       -         (?, 1, 128, 128)    -               
G_synthesis/Grow_lod2         -         (?, 1, 128, 128)    -               
G_synthesis/256x256/Conv0_up  139520    (?, 64, 256, 256)   (3, 3, 128, 64) 
G_synthesis/256x256/Conv1     102656    (?, 64, 256, 256)   (3, 3, 64, 64)  
G_synthesis/ToRGB_lod1        65        (?, 1, 256, 256)    (1, 1, 64, 1)   
G_synthesis/Upscale2D_5       -         (?, 1, 256, 256)    -               
G_synthesis/Grow_lod1         -         (?, 1, 256, 256)    -               
G_synthesis/512x512/Conv0_up  51328     (?, 32, 512, 512)   (3, 3, 64, 32)  
G_synthesis/512x512/Conv1     42112     (?, 32, 512, 512)   (3, 3, 32, 32)  
G_synthesis/ToRGB_lod0        33        (?, 1, 512, 512)    (1, 1, 32, 1)   
G_synthesis/Upscale2D_6       -         (?, 1, 512, 512)    -               
G_synthesis/Grow_lod0         -         (?, 1, 512, 512)    -               
G_synthesis/images_out        -         (?, 1, 512, 512)    -               
G_synthesis/lod               -         ()                  -               
G_synthesis/noise0            -         (1, 1, 4, 4)        -               
G_synthesis/noise1            -         (1, 1, 4, 4)        -               
G_synthesis/noise2            -         (1, 1, 8, 8)        -               
G_synthesis/noise3            -         (1, 1, 8, 8)        -               
G_synthesis/noise4            -         (1, 1, 16, 16)      -               
G_synthesis/noise5            -         (1, 1, 16, 16)      -               
G_synthesis/noise6            -         (1, 1, 32, 32)      -               
G_synthesis/noise7            -         (1, 1, 32, 32)      -               
G_synthesis/noise8            -         (1, 1, 64, 64)      -               
G_synthesis/noise9            -         (1, 1, 64, 64)      -               
G_synthesis/noise10           -         (1, 1, 128, 128)    -               
G_synthesis/noise11           -         (1, 1, 128, 128)    -               
G_synthesis/noise12           -         (1, 1, 256, 256)    -               
G_synthesis/noise13           -         (1, 1, 256, 256)    -               
G_synthesis/noise14           -         (1, 1, 512, 512)    -               
G_synthesis/noise15           -         (1, 1, 512, 512)    -               
images_out                    -         (?, 1, 512, 512)    -               
---                           ---       ---                 ---             
Total                         30114536                                      


D                    Params    OutputShape         WeightShape     
---                  ---       ---                 ---             
images_in            -         (?, 1, 512, 512)    -               
labels_in            -         (?, 0)              -               
lod                  -         ()                  -               
FromRGB_lod0         64        (?, 32, 512, 512)   (1, 1, 1, 32)   
512x512/Conv0        9248      (?, 32, 512, 512)   (3, 3, 32, 32)  
512x512/Conv1_down   18496     (?, 64, 256, 256)   (3, 3, 32, 64)  
Downscale2D          -         (?, 1, 256, 256)    -               
FromRGB_lod1         128       (?, 64, 256, 256)   (1, 1, 1, 64)   
Grow_lod0            -         (?, 64, 256, 256)   -               
256x256/Conv0        36928     (?, 64, 256, 256)   (3, 3, 64, 64)  
256x256/Conv1_down   73856     (?, 128, 128, 128)  (3, 3, 64, 128) 
Downscale2D_1        -         (?, 1, 128, 128)    -               
FromRGB_lod2         256       (?, 128, 128, 128)  (1, 1, 1, 128)  
Grow_lod1            -         (?, 128, 128, 128)  -               
128x128/Conv0        147584    (?, 128, 128, 128)  (3, 3, 128, 128)
128x128/Conv1_down   295168    (?, 256, 64, 64)    (3, 3, 128, 256)
Downscale2D_2        -         (?, 1, 64, 64)      -               
FromRGB_lod3         512       (?, 256, 64, 64)    (1, 1, 1, 256)  
Grow_lod2            -         (?, 256, 64, 64)    -               
64x64/Conv0          590080    (?, 256, 64, 64)    (3, 3, 256, 256)
64x64/Conv1_down     1180160   (?, 512, 32, 32)    (3, 3, 256, 512)
Downscale2D_3        -         (?, 1, 32, 32)      -               
FromRGB_lod4         1024      (?, 512, 32, 32)    (1, 1, 1, 512)  
Grow_lod3            -         (?, 512, 32, 32)    -               
32x32/Conv0          2359808   (?, 512, 32, 32)    (3, 3, 512, 512)
32x32/Conv1_down     2359808   (?, 512, 16, 16)    (3, 3, 512, 512)
Downscale2D_4        -         (?, 1, 16, 16)      -               
FromRGB_lod5         1024      (?, 512, 16, 16)    (1, 1, 1, 512)  
Grow_lod4            -         (?, 512, 16, 16)    -               
16x16/Conv0          2359808   (?, 512, 16, 16)    (3, 3, 512, 512)
16x16/Conv1_down     2359808   (?, 512, 8, 8)      (3, 3, 512, 512)
Downscale2D_5        -         (?, 1, 8, 8)        -               
FromRGB_lod6         1024      (?, 512, 8, 8)      (1, 1, 1, 512)  
Grow_lod5            -         (?, 512, 8, 8)      -               
8x8/Conv0            2359808   (?, 512, 8, 8)      (3, 3, 512, 512)
8x8/Conv1_down       2359808   (?, 512, 4, 4)      (3, 3, 512, 512)
Downscale2D_6        -         (?, 1, 4, 4)        -               
FromRGB_lod7         1024      (?, 512, 4, 4)      (1, 1, 1, 512)  
Grow_lod6            -         (?, 512, 4, 4)      -               
4x4/MinibatchStddev  -         (?, 513, 4, 4)      -               
4x4/Conv             2364416   (?, 512, 4, 4)      (3, 3, 513, 512)
4x4/Dense0           4194816   (?, 512)            (8192, 512)     
4x4/Dense1           513       (?, 1)              (512, 1)        
scores_out           -         (?, 1)              -               
---                  ---       ---                 ---             
Total                23075169                                      

Building TensorFlow graph...
Setting up snapshot image grid...
Setting up run dir...
Training...

Traceback (most recent call last):
  File "D:\Unet\anaconda\envs\tf112\lib\site-packages\tensorflow\python\client\session.py", line 1334, in _do_call
    return fn(*args)
  File "D:\Unet\anaconda\envs\tf112\lib\site-packages\tensorflow\python\client\session.py", line 1319, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "D:\Unet\anaconda\envs\tf112\lib\site-packages\tensorflow\python\client\session.py", line 1407, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(16, 8192), b.shape=(16, 512), m=8192, n=512, k=16
	 [[{{node GPU0/TrainD_grad/gradients/GPU0/D_loss/D_1/4x4/Dense0/MatMul_grad/MatMul_1}} = MatMul[T=DT_FLOAT, transpose_a=true, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](GPU0/D_loss/D_1/4x4/Dense0/Reshape, GPU0/TrainD_grad/gradients/GPU0/D_loss/D_1/4x4/Dense0/add_grad/Reshape)]]
	 [[{{node TrainD/ApplyGrads0/UpdateWeights/cond/pred_id/_1585}} = _HostRecv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_24297_TrainD/ApplyGrads0/UpdateWeights/cond/pred_id", tensor_type=DT_BOOL, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "train.py", line 193, in <module>
    main()
  File "train.py", line 188, in main
    dnnlib.submit_run(**kwargs)
  File "D:\dy\idinvert\dnnlib\submission\submit.py", line 290, in submit_run
    run_wrapper(submit_config)
  File "D:\dy\idinvert\dnnlib\submission\submit.py", line 242, in run_wrapper
    util.call_func_by_name(func_name=submit_config.run_func_name, submit_config=submit_config, **submit_config.run_func_kwargs)
  File "D:\dy\idinvert\dnnlib\util.py", line 257, in call_func_by_name
    return func_obj(*args, **kwargs)
  File "D:\dy\idinvert\training\training_loop.py", line 231, in training_loop
    tflib.run([D_train_op, Gs_update_op], {lod_in: sched.lod, lrate_in: sched.D_lrate, minibatch_in: sched.minibatch})
  File "D:\dy\idinvert\dnnlib\tflib\tfutil.py", line 26, in run
    return tf.get_default_session().run(*args, **kwargs)
  File "D:\Unet\anaconda\envs\tf112\lib\site-packages\tensorflow\python\client\session.py", line 929, in run
    run_metadata_ptr)
  File "D:\Unet\anaconda\envs\tf112\lib\site-packages\tensorflow\python\client\session.py", line 1152, in _run
    feed_dict_tensor, options, run_metadata)
  File "D:\Unet\anaconda\envs\tf112\lib\site-packages\tensorflow\python\client\session.py", line 1328, in _do_run
    run_metadata)
  File "D:\Unet\anaconda\envs\tf112\lib\site-packages\tensorflow\python\client\session.py", line 1348, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(16, 8192), b.shape=(16, 512), m=8192, n=512, k=16
	 [[node GPU0/TrainD_grad/gradients/GPU0/D_loss/D_1/4x4/Dense0/MatMul_grad/MatMul_1 (defined at D:\dy\idinvert\dnnlib\tflib\optimizer.py:98)  = MatMul[T=DT_FLOAT, transpose_a=true, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](GPU0/D_loss/D_1/4x4/Dense0/Reshape, GPU0/TrainD_grad/gradients/GPU0/D_loss/D_1/4x4/Dense0/add_grad/Reshape)]]
	 [[{{node TrainD/ApplyGrads0/UpdateWeights/cond/pred_id/_1585}} = _HostRecv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_24297_TrainD/ApplyGrads0/UpdateWeights/cond/pred_id", tensor_type=DT_BOOL, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Caused by op 'GPU0/TrainD_grad/gradients/GPU0/D_loss/D_1/4x4/Dense0/MatMul_grad/MatMul_1', defined at:
  File "train.py", line 193, in <module>
    main()
  File "train.py", line 188, in main
    dnnlib.submit_run(**kwargs)
  File "D:\dy\idinvert\dnnlib\submission\submit.py", line 290, in submit_run
    run_wrapper(submit_config)
  File "D:\dy\idinvert\dnnlib\submission\submit.py", line 242, in run_wrapper
    util.call_func_by_name(func_name=submit_config.run_func_name, submit_config=submit_config, **submit_config.run_func_kwargs)
  File "D:\dy\idinvert\dnnlib\util.py", line 257, in call_func_by_name
    return func_obj(*args, **kwargs)
  File "D:\dy\idinvert\training\training_loop.py", line 184, in training_loop
    D_opt.register_gradients(tf.reduce_mean(D_loss), D_gpu.trainables)
  File "D:\dy\idinvert\dnnlib\tflib\optimizer.py", line 98, in register_gradients
    grads = self._dev_opt[dev].compute_gradients(loss, trainable_vars, gate_gradients=tf.train.Optimizer.GATE_NONE)  # disable gating to reduce memory usage
  File "D:\Unet\anaconda\envs\tf112\lib\site-packages\tensorflow\python\training\optimizer.py", line 519, in compute_gradients
    colocate_gradients_with_ops=colocate_gradients_with_ops)
  File "D:\Unet\anaconda\envs\tf112\lib\site-packages\tensorflow\python\ops\gradients_impl.py", line 630, in gradients
    gate_gradients, aggregation_method, stop_gradients)
  File "D:\Unet\anaconda\envs\tf112\lib\site-packages\tensorflow\python\ops\gradients_impl.py", line 814, in _GradientsHelper
    lambda: grad_fn(op, *out_grads))
  File "D:\Unet\anaconda\envs\tf112\lib\site-packages\tensorflow\python\ops\gradients_impl.py", line 408, in _MaybeCompile
    return grad_fn()  # Exit early
  File "D:\Unet\anaconda\envs\tf112\lib\site-packages\tensorflow\python\ops\gradients_impl.py", line 814, in <lambda>
    lambda: grad_fn(op, *out_grads))
  File "D:\Unet\anaconda\envs\tf112\lib\site-packages\tensorflow\python\ops\math_grad.py", line 1131, in _MatMulGrad
    grad_b = gen_math_ops.mat_mul(a, grad, transpose_a=True)
  File "D:\Unet\anaconda\envs\tf112\lib\site-packages\tensorflow\python\ops\gen_math_ops.py", line 4560, in mat_mul
    name=name)
  File "D:\Unet\anaconda\envs\tf112\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "D:\Unet\anaconda\envs\tf112\lib\site-packages\tensorflow\python\util\deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "D:\Unet\anaconda\envs\tf112\lib\site-packages\tensorflow\python\framework\ops.py", line 3274, in create_op
    op_def=op_def)
  File "D:\Unet\anaconda\envs\tf112\lib\site-packages\tensorflow\python\framework\ops.py", line 1770, in __init__
    self._traceback = tf_stack.extract_stack()

...which was originally created as op 'GPU0/D_loss/D_1/4x4/Dense0/MatMul', defined at:
  File "train.py", line 193, in <module>
    main()
[elided 3 identical lines from previous traceback]
  File "D:\dy\idinvert\dnnlib\util.py", line 257, in call_func_by_name
    return func_obj(*args, **kwargs)
  File "D:\dy\idinvert\training\training_loop.py", line 182, in training_loop
    D_loss = dnnlib.util.call_func_by_name(G=G_gpu, D=D_gpu, opt=D_opt, training_set=training_set, minibatch_size=minibatch_split, reals=reals, labels=labels, **D_loss_args)
  File "D:\dy\idinvert\dnnlib\util.py", line 257, in call_func_by_name
    return func_obj(*args, **kwargs)
  File "D:\dy\idinvert\training\loss.py", line 154, in D_logistic_simplegp
    fake_scores_out = fp32(D.get_output_for(fake_images_out, labels, is_training=True))
  File "D:\dy\idinvert\dnnlib\tflib\network.py", line 222, in get_output_for
    out_expr = self._build_func(*final_inputs, **build_kwargs)
  File "D:\dy\idinvert\training\networks_stylegan.py", line 654, in D_basic
    scores_out = grow(2, resolution_log2 - 2)
  File "D:\dy\idinvert\training\networks_stylegan.py", line 651, in grow
    x = block(x(), res); y = lambda: x
  File "D:\dy\idinvert\training\networks_stylegan.py", line 619, in block
    x = act(apply_bias(dense(x, fmaps=nf(res-2), gain=gain, use_wscale=use_wscale)))
  File "D:\dy\idinvert\training\networks_stylegan.py", line 159, in dense
    return tf.matmul(x, w)
  File "D:\Unet\anaconda\envs\tf112\lib\site-packages\tensorflow\python\ops\math_ops.py", line 2057, in matmul
    a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
  File "D:\Unet\anaconda\envs\tf112\lib\site-packages\tensorflow\python\ops\gen_math_ops.py", line 4560, in mat_mul
    name=name)
  File "D:\Unet\anaconda\envs\tf112\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "D:\Unet\anaconda\envs\tf112\lib\site-packages\tensorflow\python\util\deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "D:\Unet\anaconda\envs\tf112\lib\site-packages\tensorflow\python\framework\ops.py", line 3274, in create_op
    op_def=op_def)

InternalError (see above for traceback): Blas GEMM launch failed : a.shape=(16, 8192), b.shape=(16, 512), m=8192, n=512, k=16
	 [[node GPU0/TrainD_grad/gradients/GPU0/D_loss/D_1/4x4/Dense0/MatMul_grad/MatMul_1 (defined at D:\dy\idinvert\dnnlib\tflib\optimizer.py:98)  = MatMul[T=DT_FLOAT, transpose_a=true, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](GPU0/D_loss/D_1/4x4/Dense0/Reshape, GPU0/TrainD_grad/gradients/GPU0/D_loss/D_1/4x4/Dense0/add_grad/Reshape)]]
	 [[{{node TrainD/ApplyGrads0/UpdateWeights/cond/pred_id/_1585}} = _HostRecv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_24297_TrainD/ApplyGrads0/UpdateWeights/cond/pred_id", tensor_type=DT_BOOL, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

is this problem caused by large batch_size? but when i turn down the batch_size ,the problem is still occured.

MHX1203 avatar May 01 '22 11:05 MHX1203

You can try on the images with the resolution of 256x256 and see if the problem still happens.

zhujiapeng avatar May 03 '22 07:05 zhujiapeng

the problem still occured.

MHX1203 avatar May 15 '22 11:05 MHX1203

Your environment may cause it. I find some solutions, such as here and here, and see if these can help.

zhujiapeng avatar May 15 '22 11:05 zhujiapeng