age-gender-estimation icon indicating copy to clipboard operation
age-gender-estimation copied to clipboard

"Illegal instruction (core dumped)" when running the program with tensorflow with gpu

Open galoiscch opened this issue 7 years ago • 20 comments

I succeeded running the program with tensorflow without gpu. However, I can't run the program with tensorflow with gpu. The following error appears when I run the program:

Using TensorFlow backend. 2017-07-05 10:18:44.115782: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2017-07-05 10:18:44.116126: I tensorflow/core/common_runtime/gpu/gpu_device.cc:938] Found device 0 with properties: name: GeForce GTX 1050 major: 6 minor: 1 memoryClockRate (GHz) 1.468 pciBusID 0000:01:00.0 Total memory: 1.95GiB Free memory: 1.72GiB 2017-07-05 10:18:44.116175: I tensorflow/core/common_runtime/gpu/gpu_device.cc:959] DMA: 0 2017-07-05 10:18:44.116189: I tensorflow/core/common_runtime/gpu/gpu_device.cc:969] 0: Y 2017-07-05 10:18:44.116214: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1028] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1050, pci bus id: 0000:01:00.0) Illegal instruction (core dumped)

Does this program compatible with tensorflow with gpu? The system I am using is list as following: Ubuntu 16.04,Python 2.7.12 ,Keras 2.0.5,Tensorflow 1.2.0,CUDA 8.0, V8.0.61 ,cuDNN 6.0

galoiscch avatar Jul 05 '17 03:07 galoiscch

update: I now realize that I actually didn't try this age-estimation program in this computer. I only produced a successful result in another computer with i5 cpu. The problem of this computer is that it has a very old cpu(E5200), the old cpu is not supported by dlib installed by .whl(sudo pip install dlib) The solution is as following: https://github.com/davisking/dlib/issues/620 By downloading dlib and compile it yourself, the dlib will suit your computer hardware configuration.

I downloaded dlib here: https://github.com/davisking/dlib/

Before compiling dlib, I edited dlib's tools/python/CMakeLists.txt file from:

set(USE_SSE4_INSTRUCTIONS ON CACHE BOOL "Use SSE4 instructions")

to:

set(USE_SSE2_INSTRUCTIONS ON CACHE BOOL "Use SSE2 instructions")

Then I run

python3 setup.py install

But Now, I encounter another problem. After I run the program, a window showing webcam captured image is pop out. However, when there is a human face captured by the webcam, the program crashed. The following is the error:

Using TensorFlow backend. 2017-07-06 09:35:29.039507: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2017-07-06 09:35:29.039853: I tensorflow/core/common_runtime/gpu/gpu_device.cc:938] Found device 0 with properties: name: GeForce GTX 1050 major: 6 minor: 1 memoryClockRate (GHz) 1.468 pciBusID 0000:01:00.0 Total memory: 1.95GiB Free memory: 1.71GiB 2017-07-06 09:35:29.039903: I tensorflow/core/common_runtime/gpu/gpu_device.cc:959] DMA: 0 2017-07-06 09:35:29.039917: I tensorflow/core/common_runtime/gpu/gpu_device.cc:969] 0: Y 2017-07-06 09:35:29.039941: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1028] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1050, pci bus id: 0000:01:00.0) 2017-07-06 09:35:33.377692: E tensorflow/stream_executor/cuda/cuda_dnn.cc:371] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2017-07-06 09:35:33.377776: E tensorflow/stream_executor/cuda/cuda_dnn.cc:338] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM 2017-07-06 09:35:33.377796: F tensorflow/core/kernels/conv_ops.cc:672] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo<T>(), &algorithms) Aborted (core dumped)

"could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR" only appears when a human is captured. P.S. I change the program a little bit. I added a line " if len(results)>0:" before the line "predicted_genders = results[0]", so that a window will pop out even if there is no human face in it

galoiscch avatar Jul 06 '17 02:07 galoiscch

Update: I suspected that the problem stem from the memory allocation method of tensorflow. Knowing that we are unable to limit the gpu's memory usage when using keras with tensorflow backend(such as gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.333)), I switch to use Keras with Theano. It works. However, the age were misjudged by a relatively large amount. The result is less desirable than the output produced using CPU(i5). Therefore, I wonder whether this program is incompatible with Theano, or it is just the problem of the insufficient computation power of my GPU(gtx 1050)

galoiscch avatar Jul 07 '17 03:07 galoiscch

Thank you for your useful information. Firstly, I fixed demo.py according to your comment "I added a line " if len(results)>0:".

As I did not try training the model using Theano backend, I'm not sure my program is perfectly compatible with Theano. But I think it will be. I think the problem is in using the weights obtained with TensorFlow. I'm afraid that the Theano-trained weights are not compatible with the TensorFlow one. You can convert the weights bidirectionally as explained here to solve this problem.

yu4u avatar Jul 13 '17 16:07 yu4u

import os
from keras import backend as K
from keras.utils.conv_utils import convert_kernel
from wide_resnet import WideResNet

img_size=64
model = WideResNet(img_size, depth=16, k=8)()
model.load_weights(os.path.join("pretrained_models", "weights.18-4.06.hdf5"))

for layer in model.layers:
   if layer.__class__.__name__ in ['Convolution1D', 'Convolution2D']:
      original_w = K.get_value(layer.W)
      converted_w = convert_kernel(original_w)
      K.set_value(layer.W, converted_w)

model.save_weights(os.path.join("pretrained_models", 'weights.18-4.06_theano.h5'))

Will this python script convert the weight file correctly? I tried to use the 'weights.18-4.06_theano.h5', but the output is the same, the age predicted from most people is around 40 years old.

galoiscch avatar Jul 17 '17 02:07 galoiscch

The above code seems to work fine according to the instruction I referred to. But it also does not work for me... I trained the model with Theano backend so please try it: https://drive.google.com/file/d/0B_cG1nzvVZlQWGJMc2JjdzkwcVk/view?usp=sharing

yu4u avatar Jul 18 '17 14:07 yu4u

Thank a lot

galoiscch avatar Jul 18 '17 15:07 galoiscch

How much time does the training process need? I used cpu for training using wiki dataset and it can only reached the fourth epoch in one day. What is the hardware configuration of your computer?

galoiscch avatar Jul 18 '17 15:07 galoiscch

I trained on GPU: CPU: i7-7700 3.60GHz, GPU: GeForce GTX1080. Training requires 1-2 hours for imdb and 6 minutes for wiki.

If the problem is memory allocation, please try smaller model and smaller batch size:

python3 train.py --input data/imdb_db.mat --depth 10 --width 4 --batch_size 16

If the image size is 64, the number of parameters can also be reduced by changing

pool = AveragePooling2D(pool_size=(8, 8), strides=(1, 1), padding="same")(relu)

to

pool = AveragePooling2D(pool_size=(16, 16), strides=(1, 1), padding="same")(relu)

yu4u avatar Jul 19 '17 18:07 yu4u

The Theano weight works well.

galoiscch avatar Jul 20 '17 02:07 galoiscch

After running the command python train.py --input data/imdb_db.mat --depth 10 --width 4 --batch_size 32 ,I can run the training program with tensorflow with GPU binding. Howeven, when I test the new weight file, folloing error appears,

Using TensorFlow backend.
Traceback (most recent call last):
  File "demo.py", line 97, in <module>
    main()
  File "demo.py", line 25, in main
    model.load_weights(os.path.join("pretrained_models", "weights.15-4.02.hdf5"))
  File "/usr/local/lib/python2.7/dist-packages/Keras-2.0.5-py2.7.egg/keras/engine/topology.py", line 2572, in load_weights
    load_weights_from_hdf5_group(f, self.layers)
  File "/usr/local/lib/python2.7/dist-packages/Keras-2.0.5-py2.7.egg/keras/engine/topology.py", line 2981, in load_weights_from_hdf5_group
    str(len(filtered_layers)) + ' layers.')
ValueError: You are trying to load a weight file containing 19 layers into a model with 31 layers.

I wonder if it is due to the change I made in the command line. Thanks. I didn't change the number of parameters.

galoiscch avatar Jul 21 '17 01:07 galoiscch

The size of the weight file is really smaller. Your weight file is 195.8 MB in size, while my weight file is just 63.7 MB.

galoiscch avatar Jul 21 '17 01:07 galoiscch

demo.py is just a demo script, which assumes to use the pre-trained model as you can see:

model = WideResNet(img_size, depth=16, k=8)()
model.load_weights(os.path.join("pretrained_models", "weights.18-4.06.hdf5"))

But I added demo.py options to identify the weight file, depth, and width parameters. Please refer to the latest version of demo.py.

yu4u avatar Jul 21 '17 17:07 yu4u

The size of the weight file is really smaller. Your weight file is 195.8 MB in size, while my weight file is just 63.7 MB.

These options --depth 10 --width 4 control the number of parameters used in the CNN, thus it is natural that the size of the weight file changes.

yu4u avatar Jul 21 '17 18:07 yu4u

Much obliged. I can run the demo.py with my weight file now.

galoiscch avatar Jul 24 '17 02:07 galoiscch

Hi, did you run with tensorflow backend using GPU?

sbharadwajj avatar Oct 15 '18 07:10 sbharadwajj

I think I tried running the program with tensorflow backend using GPU, but it failed. It has been a long time and my memory on this project became quite rusty. I am sorry about that.

galoiscch avatar Oct 16 '18 13:10 galoiscch

Thank you. @yu4u do you run it on GPU? Do you have any suggestions on how to fix it for Gpu?

sbharadwajj avatar Oct 17 '18 06:10 sbharadwajj

I did not run demo.py on a machine with GPUs but I think it works. Is there any problem?

yu4u avatar Oct 17 '18 16:10 yu4u

Works perfectly with tensorflow-gpu 1.10.

sbharadwajj avatar Oct 17 '18 16:10 sbharadwajj

@galoiscch

I'm running into memory issues when running this in a conda env with Pytorch GPU and Tensorflow GPU (detection done by Tencent DSFD and not dlib): https://github.com/TencentYoutuResearch/FaceDetection-DSFD

So I want to use a shallower and narrower model. Can you provide the smaller weights?

nyck33 avatar Sep 11 '19 12:09 nyck33