dalle-mini icon indicating copy to clipboard operation
dalle-mini copied to clipboard

Errors while generating Image with GPU

Open Lauris1988 opened this issue 2 years ago • 10 comments

Errors while trying to generate images with GPU:

XlaRuntimeError                           Traceback (most recent call last)
Input In [13], in <cell line: 9>()
     23 encoded_images = encoded_images.sequences[..., 1:]
     24 # decode images
---> 25 decoded_images = p_decode(encoded_images, vqgan_params)
     26 decoded_images = decoded_images.clip(0.0, 1.0).reshape((-1, 256, 256, 3))
     27 for decoded_img in decoded_images:

    [... skipping hidden 15 frame]

File /usr/local/lib/python3.8/dist-packages/jax/_src/dispatch.py:713, in backend_compile(backend, built_c, options)
    709 @profiler.annotate_function
    710 def backend_compile(backend, built_c, options):
    711   # we use a separate function call to ensure that XLA compilation appears
    712   # separately in Python profiling results
--> 713   return backend.compile(built_c, compile_options=options)

XlaRuntimeError: UNKNOWN: Failed to determine best cudnn convolution algorithm for:
%cudnn-conv-bias-activation.2 = (f32[2,16,16,256]{2,1,3,0}, u8[0]{0}) custom-call(f32[2,16,16,256]{2,1,3,0} %bitcast.220, f32[1,1,256,256]{1,0,2,3} %copy, f32[256]{0} %get-tuple-element.341), window={size=1x1}, dim_labels=b01f_01io->b01f, custom_call_target="__cudnn$convBiasActivationForward", metadata={op_name="pmap(p_decode)/jit(main)/conv_general_dilated[window_strides=(1, 1) padding=((0, 0), (0, 0)) lhs_dilation=(1, 1) rhs_dilation=(1, 1) dimension_numbers=ConvDimensionNumbers(lhs_spec=(0, 3, 1, 2), rhs_spec=(3, 2, 0, 1), out_spec=(0, 3, 1, 2)) feature_group_count=1 batch_group_count=1 lhs_shape=(2, 16, 16, 256) rhs_shape=(1, 1, 256, 256) precision=None preferred_element_type=None]" source_file="/usr/local/lib/python3.8/dist-packages/flax/linen/linear.py" source_line=425}, backend_config="{\"conv_result_scale\":1,\"activation_mode\":\"0\",\"side_input_scale\":0}"

Original error: INTERNAL: All algorithms tried for %cudnn-conv-bias-activation.2 = (f32[2,16,16,256]{2,1,3,0}, u8[0]{0}) custom-call(f32[2,16,16,256]{2,1,3,0} %bitcast.220, f32[1,1,256,256]{1,0,2,3} %copy, f32[256]{0} %get-tuple-element.341), window={size=1x1}, dim_labels=b01f_01io->b01f, custom_call_target="__cudnn$convBiasActivationForward", metadata={op_name="pmap(p_decode)/jit(main)/conv_general_dilated[window_strides=(1, 1) padding=((0, 0), (0, 0)) lhs_dilation=(1, 1) rhs_dilation=(1, 1) dimension_numbers=ConvDimensionNumbers(lhs_spec=(0, 3, 1, 2), rhs_spec=(3, 2, 0, 1), out_spec=(0, 3, 1, 2)) feature_group_count=1 batch_group_count=1 lhs_shape=(2, 16, 16, 256) rhs_shape=(1, 1, 256, 256) precision=None preferred_element_type=None]" source_file="/usr/local/lib/python3.8/dist-packages/flax/linen/linear.py" source_line=425}, backend_config="{\"conv_result_scale\":1,\"activation_mode\":\"0\",\"side_input_scale\":0}" failed. Falling back to default algorithm.  Per-algorithm errors:
  Profiling failure on cuDNN engine 1#TC: UNKNOWN: CUDNN_STATUS_ALLOC_FAILED
in external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc(4839): 'status'
  Profiling failure on cuDNN engine 1: UNKNOWN: CUDNN_STATUS_ALLOC_FAILED
in external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc(4839): 'status'
  Profiling failure on cuDNN engine 1#TC: UNKNOWN: CUDNN_STATUS_ALLOC_FAILED
in external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc(4839): 'status'
  Profiling failure on cuDNN engine 1: UNKNOWN: CUDNN_STATUS_ALLOC_FAILED
in external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc(4839): 'status'

To ignore this failure and try to use a fallback algorithm (which may have suboptimal performance), use XLA_FLAGS=--xla_gpu_strict_conv_algorithm_picker=false.  Please also file a bug for the root cause of failing autotuning.

Lauris1988 avatar Jun 25 '22 17:06 Lauris1988

I too receive this error, indications are the GPU has run out of memory,

https://github.com/google/jax/issues/8506

aghadjip avatar Jul 18 '22 00:07 aghadjip

Trying to set the environment variable as explaines in the referenced issue, same error...

XlaRuntimeError: UNKNOWN: Failed to determine best cudnn convolution algorithm for:
%cudnn-conv-bias-activation.2 = (f32[2,16,16,256]{3,2,1,0}, u8[0]{0}) custom-call(f32[2,16,16,256]{3,2,1,0} %bitcast.106, f32[1,1,256,256]{2,1,0,3} %copy, f32[256]{0} %get-tuple-element.341), window={size=1x1}, dim_labels=b01f_01io->b01f, custom_call_target="__cudnn$convBiasActivationForward", metadata={op_name="pmap(p_decode)/jit(main)/VQModule.decode_code/VQModule.decode/post_quant_conv/conv_general_dilated[window_strides=(1, 1) padding=((0, 0), (0, 0)) lhs_dilation=(1, 1) rhs_dilation=(1, 1) dimension_numbers=ConvDimensionNumbers(lhs_spec=(0, 3, 1, 2), rhs_spec=(3, 2, 0, 1), out_spec=(0, 3, 1, 2)) feature_group_count=1 batch_group_count=1 lhs_shape=(2, 16, 16, 256) rhs_shape=(1, 1, 256, 256) precision=None preferred_element_type=None]" source_file="/home/metal3d/Projects/ML/dalle/.v/lib64/python3.10/site-packages/flax/linen/linear.py" source_line=370}, backend_config="{\"conv_result_scale\":1,\"activation_mode\":\"0\",\"side_input_scale\":0}"

Original error: INTERNAL: All algorithms tried for %cudnn-conv-bias-activation.2 = (f32[2,16,16,256]{3,2,1,0}, u8[0]{0}) custom-call(f32[2,16,16,256]{3,2,1,0} %bitcast.106, f32[1,1,256,256]{2,1,0,3} %copy, f32[256]{0} %get-tuple-element.341), window={size=1x1}, dim_labels=b01f_01io->b01f, custom_call_target="__cudnn$convBiasActivationForward", metadata={op_name="pmap(p_decode)/jit(main)/VQModule.decode_code/VQModule.decode/post_quant_conv/conv_general_dilated[window_strides=(1, 1) padding=((0, 0), (0, 0)) lhs_dilation=(1, 1) rhs_dilation=(1, 1) dimension_numbers=ConvDimensionNumbers(lhs_spec=(0, 3, 1, 2), rhs_spec=(3, 2, 0, 1), out_spec=(0, 3, 1, 2)) feature_group_count=1 batch_group_count=1 lhs_shape=(2, 16, 16, 256) rhs_shape=(1, 1, 256, 256) precision=None preferred_element_type=None]" source_file="/home/metal3d/Projects/ML/dalle/.v/lib64/python3.10/site-packages/flax/linen/linear.py" source_line=370}, backend_config="{\"conv_result_scale\":1,\"activation_mode\":\"0\",\"side_input_scale\":0}" failed. Falling back to default algorithm.  Per-algorithm errors:
  Profiling failure on cuDNN engine 1#TC: UNKNOWN: CUDNN_STATUS_ALLOC_FAILED
in external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc(4839): 'status'
  Profiling failure on cuDNN engine 1: UNKNOWN: CUDNN_STATUS_ALLOC_FAILED
in external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc(4839): 'status'
  Profiling failure on cuDNN engine 1#TC: UNKNOWN: CUDNN_STATUS_ALLOC_FAILED
in external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc(4839): 'status'
  Profiling failure on cuDNN engine 1: UNKNOWN: CUDNN_STATUS_ALLOC_FAILED
in external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc(4839): 'status'

metal3d avatar Jul 21 '22 15:07 metal3d

This works:

import os
os.environ["XLA_PYTHON_CLIENT_MEM_FRACTION"]="0.8"

metal3d avatar Jul 21 '22 16:07 metal3d

I too receive this error, indications are the GPU has run out of memory like this: https://patsybond172.github.io/kitchen-cabinets or this https://github.com/patsybond172

patsybond172 avatar Dec 16 '22 02:12 patsybond172