C3D-tensorflow icon indicating copy to clipboard operation
C3D-tensorflow copied to clipboard

After some steps while training, the weights and loss value will be NaN .

Open zeynepgokce opened this issue 6 years ago • 5 comments

Hello everyone,

I have a question about training this model with different dataset.

When i finetune the c3d model with UCF101 data, there is no problem. But when i change the dataset i have got this error that loss is Nan Value.

I tried some ways to handle this problem which did not solve it.

  1. changed the learning rate
  2. changed batch size
  3. tried with small test and train split ( with #num : 3)

For instance, these are steps of training same model with different dataset. Learning rates and batch size are same with original as this model. Just dataset is changed.

('Step : ', 31)
------------------------------------------------------------------
 TRAIN DATA READING  ...
Training Data Eval:
accuracy: 0.00000
(' Loss : ', array([ 2.60935736,  2.62928033,  2.65052104,  2.6719377 ,  2.69320059,
        2.7358048 ,  2.73551226,  2.73502755,  3.95449066,  3.61877584,
        2.61726952,  2.60790229,  2.60790229,  2.60790229,  2.60790229,
        2.60790229,  2.60790229,  2.60790229,  2.60790229,  2.60790229,
        2.60790229,  2.60790229], dtype=float32))
 TEST DATA READING  ...
Validation Data Eval:
accuracy: 0.00000
('Step : ', 32)
------------------------------------------------------------------
 TRAIN DATA READING  ...
Training Data Eval:
accuracy: 0.10000
(' Loss : ', array([ 3.21029162,  3.2302146 ,  3.25145531,  3.27287173,  3.29413438,
        3.33673644,  3.33644032,  3.33594394,  4.55498886,  4.21940422,
        3.21820426,  3.20883656,  3.20883656,  3.20883656,  3.20883656,
        3.20883656,  3.20883656,  3.20883656,  3.20883656,  3.20883656,
        3.20883656,  3.20883656], dtype=float32))
 TEST DATA READING  ...
Validation Data Eval:
accuracy: 0.20000
('Step : ', 33)
------------------------------------------------------------------
 TRAIN DATA READING  ...
Training Data Eval:
accuracy: 0.30000
(' Loss : ', array([ 2.50483251,  2.52475548,  2.54599619,  2.56741214,  2.58867455,
        2.63127494,  2.63097525,  2.63046718,  3.84910154,  3.51364231,
        2.51274562,  2.50337744,  2.50337744,  2.50337744,  2.50337744,
        2.50337744,  2.50337744,  2.50337744,  2.50337744,  2.50337744,
        2.50337744,  2.50337744], dtype=float32))
 TEST DATA READING  ...
Validation Data Eval:
accuracy: 0.00000
('Step : ', 34)
------------------------------------------------------------------
 TRAIN DATA READING  ...
Training Data Eval:
accuracy: 0.00000
(' Loss : ', array([ nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan], dtype=float32))
 TEST DATA READING  ...
Validation Data Eval:
accuracy: 0.00000

Where should be the problem ? Why does not this model work with different dataset ? Any suggestion? Thank you.

zeynepgokce avatar May 03 '18 02:05 zeynepgokce

try use labels = tf.one_hot(labels_placeholder, c3d_model.NUM_CLASSES) loss = -tf.reduce_sum(labels*tf.log(tf.clip_by_value(tf.nn.softmax(logit),1e-10,1.0)),1.0) instead

zkjqw139 avatar May 30 '18 06:05 zkjqw139

@zeynepgokce hi, could you tell me how to print the loss, where should i add code? thank you.

491506870 avatar Aug 30 '18 03:08 491506870

@491506870 Hi, i just added "loss" to session simply, as following

summary, acc,l = sess.run( [merged, accuracy,loss], feed_dict={images_placeholder: train_images, labels_placeholder: train_labels }) print ("accuracy: " + "{:.5f}".format(acc)) print(" Loss : ",l)

zeynepgokce avatar Aug 30 '18 09:08 zeynepgokce

@zeynepgokce thank you so much!!! which helps me a lot~~and i think i met the same problem with you, my loss became NAN after several steps, what did you do to solve it?

491506870 avatar Aug 31 '18 03:08 491506870

@491506870 , Problem was related to my own Dataset Labelling. I changed my labelling like starting from 0 to 2 since i have 3 classes.

zeynepgokce avatar Aug 31 '18 07:08 zeynepgokce