perceptual-reflection-removal std::bad_alloc with pre-trained test

std::bad_alloc with pre-trained test

Open bonfry opened this issue 2 years ago • 0 comments

Hi, I've tried to test reflection removal with the pre-trained model on my images. I've launched this command:

python3 main.py --task pre-trained --is_training 0

However, the process crashes throwing an instance of std::bad_alloc

I've run this command on a WSL instance with an i7-10875H / 32GB RAM / RTX 3070 Laptop. I've compiled kernel with NUMA support :

CONFIG_NUMA=y
CONFIG_K8_NUMA=y
CONFIG_X86_64_ACPI_NUMA=y
CONFIG_ACPI_NUMA=y

Process log

python3 main.py --task pre-trained --is_training 0
[i] Loaded pre-trained vgg19 parameters
[i] Hypercolumn ON, building hypercolumn features ...
conda activate perceptual-reflection-removalListing trainable variables ...
<tf.Variable 'g_conv0/weights:0' shape=(1, 1, 1475, 64) dtype=float32_ref>
Listing trainable variables ...
<tf.Variable 'g_conv0/w0:0' shape=() dtype=float32_ref>
Listing trainable variables ...
<tf.Variable 'g_conv0/w1:0' shape=() dtype=float32_ref>
Listing trainable variables ...
<tf.Variable 'g_conv0/BatchNorm/beta:0' shape=(64,) dtype=float32_ref>
Listing trainable variables ...
<tf.Variable 'g_conv1/weights:0' shape=(3, 3, 64, 64) dtype=float32_ref>
Listing trainable variables ...
<tf.Variable 'g_conv1/w0:0' shape=() dtype=float32_ref>
Listing trainable variables ...
<tf.Variable 'g_conv1/w1:0' shape=() dtype=float32_ref>
Listing trainable variables ...
<tf.Variable 'g_conv1/BatchNorm/beta:0' shape=(64,) dtype=float32_ref>
Listing trainable variables ...
<tf.Variable 'g_conv2/weights:0' shape=(3, 3, 64, 64) dtype=float32_ref>
Listing trainable variables ...
<tf.Variable 'g_conv2/w0:0' shape=() dtype=float32_ref>
Listing trainable variables ...
<tf.Variable 'g_conv2/w1:0' shape=() dtype=float32_ref>
Listing trainable variables ...
<tf.Variable 'g_conv2/BatchNorm/beta:0' shape=(64,) dtype=float32_ref>
Listing trainable variables ...
<tf.Variable 'g_conv3/weights:0' shape=(3, 3, 64, 64) dtype=float32_ref>
Listing trainable variables ...
<tf.Variable 'g_conv3/w0:0' shape=() dtype=float32_ref>
Listing trainable variables ...
<tf.Variable 'g_conv3/w1:0' shape=() dtype=float32_ref>
Listing trainable variables ...
<tf.Variable 'g_conv3/BatchNorm/beta:0' shape=(64,) dtype=float32_ref>
Listing trainable variables ...
<tf.Variable 'g_conv4/weights:0' shape=(3, 3, 64, 64) dtype=float32_ref>
Listing trainable variables ...
<tf.Variable 'g_conv4/w0:0' shape=() dtype=float32_ref>
Listing trainable variables ...
<tf.Variable 'g_conv4/w1:0' shape=() dtype=float32_ref>
Listing trainable variables ...
<tf.Variable 'g_conv4/BatchNorm/beta:0' shape=(64,) dtype=float32_ref>
Listing trainable variables ...
<tf.Variable 'g_conv5/weights:0' shape=(3, 3, 64, 64) dtype=float32_ref>
Listing trainable variables ...
<tf.Variable 'g_conv5/w0:0' shape=() dtype=float32_ref>
Listing trainable variables ...
<tf.Variable 'g_conv5/w1:0' shape=() dtype=float32_ref>
Listing trainable variables ...
<tf.Variable 'g_conv5/BatchNorm/beta:0' shape=(64,) dtype=float32_ref>
Listing trainable variables ...
<tf.Variable 'g_conv6/weights:0' shape=(3, 3, 64, 64) dtype=float32_ref>
Listing trainable variables ...
<tf.Variable 'g_conv6/w0:0' shape=() dtype=float32_ref>
Listing trainable variables ...
<tf.Variable 'g_conv6/w1:0' shape=() dtype=float32_ref>
Listing trainable variables ...
<tf.Variable 'g_conv6/BatchNorm/beta:0' shape=(64,) dtype=float32_ref>
Listing trainable variables ...
<tf.Variable 'g_conv7/weights:0' shape=(3, 3, 64, 64) dtype=float32_ref>
Listing trainable variables ...
<tf.Variable 'g_conv7/w0:0' shape=() dtype=float32_ref>
Listing trainable variables ...
<tf.Variable 'g_conv7/w1:0' shape=() dtype=float32_ref>
Listing trainable variables ...
<tf.Variable 'g_conv7/BatchNorm/beta:0' shape=(64,) dtype=float32_ref>
Listing trainable variables ...
<tf.Variable 'g_conv9/weights:0' shape=(3, 3, 64, 64) dtype=float32_ref>
Listing trainable variables ...
<tf.Variable 'g_conv9/w0:0' shape=() dtype=float32_ref>
Listing trainable variables ...
<tf.Variable 'g_conv9/w1:0' shape=() dtype=float32_ref>
Listing trainable variables ...
<tf.Variable 'g_conv9/BatchNorm/beta:0' shape=(64,) dtype=float32_ref>
Listing trainable variables ...
<tf.Variable 'g_conv_last/weights:0' shape=(1, 1, 64, 6) dtype=float32_ref>
Listing trainable variables ...
<tf.Variable 'g_conv_last/biases:0' shape=(6,) dtype=float32_ref>
Listing trainable variables ...
<tf.Variable 'discriminator/layer_1/conv/filter:0' shape=(4, 4, 6, 64) dtype=float32_ref>
Listing trainable variables ...
<tf.Variable 'discriminator/layer_2/conv/filter:0' shape=(4, 4, 64, 128) dtype=float32_ref>
Listing trainable variables ...
<tf.Variable 'discriminator/layer_2/batchnorm/offset:0' shape=(128,) dtype=float32_ref>
Listing trainable variables ...
<tf.Variable 'discriminator/layer_2/batchnorm/scale:0' shape=(128,) dtype=float32_ref>
Listing trainable variables ...
<tf.Variable 'discriminator/layer_3/conv/filter:0' shape=(4, 4, 128, 256) dtype=float32_ref>
Listing trainable variables ...
<tf.Variable 'discriminator/layer_3/batchnorm/offset:0' shape=(256,) dtype=float32_ref>
Listing trainable variables ...
<tf.Variable 'discriminator/layer_3/batchnorm/scale:0' shape=(256,) dtype=float32_ref>
Listing trainable variables ...
<tf.Variable 'discriminator/layer_4/conv/filter:0' shape=(4, 4, 256, 512) dtype=float32_ref>
Listing trainable variables ...
<tf.Variable 'discriminator/layer_4/batchnorm/offset:0' shape=(512,) dtype=float32_ref>
Listing trainable variables ...
<tf.Variable 'discriminator/layer_4/batchnorm/scale:0' shape=(512,) dtype=float32_ref>
Listing trainable variables ...
<tf.Variable 'discriminator/layer_5/conv/filter:0' shape=(4, 4, 512, 1) dtype=float32_ref>
2022-03-13 12:26:11.065267: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2022-03-13 12:26:13.884531: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:950] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-03-13 12:26:13.884731: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: NVIDIA GeForce RTX 3070 Laptop GPU major: 8 minor: 6 memoryClockRate(GHz): 1.56
pciBusID: 0000:01:00.0
totalMemory: 8.00GiB freeMemory: 6.98GiB
2022-03-13 12:26:13.884883: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2022-03-13 12:26:18.254634: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-03-13 12:26:18.254792: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0
2022-03-13 12:26:18.254833: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N
2022-03-13 12:26:18.255209: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1193] Could not identify NUMA node of platform GPU id 0, defaulting to 0.  Your kernel may not have been built with NUMA support.
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
Aborted

Mar 13 '22 11:03 bonfry

perceptual-reflection-removal perceptual-reflection-removal copied to clipboard

std::bad_alloc with pre-trained test

perceptual-reflection-removal
perceptual-reflection-removal copied to clipboard