Mask_R_CNN_Keypoints icon indicating copy to clipboard operation
Mask_R_CNN_Keypoints copied to clipboard

Error in Softmax

Open filipetrocadoferreira opened this issue 7 years ago • 10 comments

ResourceExhaustedError (see above for traceback): Ran out of GPU memory when allocating 0 bytes for 
	 [[Node: mrcnn_mask_loss/SoftmaxCrossEntropyWithLogits = SoftmaxCrossEntropyWithLogits[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"](mrcnn_mask_loss/Reshape_3, mrcnn_mask_loss/Reshape_4)]]
	 [[Node: training/SGD/gradients/roi_align_mask/ExpandDims_1_grad/Reshape/_1987 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_7770_training/SGD/gradients/roi_align_mask/ExpandDims_1_grad/Reshape", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

filipetrocadoferreira avatar Jan 10 '18 17:01 filipetrocadoferreira

@filipetrocadoferreira I also have the same problem, the memory is insufficient. I solve it by changing the code in the following way, but I do not know how is going to affect this in the performance.

y_true = tf.gather(target_masks, positive_ix) y_pred = tf.gather(pred_masks, positive_ix) y_pred = tf.nn.softmax(y_pred, dim=1)

loss = [] for i in range(14): loss.append(K.switch(tf.greater(tf.reduce_sum(target_class_ids), 0), -tf.reduce_sum(y_true[:, :, i] * tf.log(y_pred[:, :, i]), 1), tf.constant(0.0))) loss = K.mean(tf.stack(loss))

What graphic card did you use?, how much memory do you have?

RodrigoGantier avatar Jan 11 '18 10:01 RodrigoGantier

I don't think it's a case of memory, but instead an error because of empty tensor

filipetrocadoferreira avatar Jan 11 '18 10:01 filipetrocadoferreira

Please check this: https://github.com/tensorflow/tensorflow/issues/6766#issuecomment-356697028

filipetrocadoferreira avatar Jan 11 '18 11:01 filipetrocadoferreira

This drove me crazy for a long time, thanks you very much, I'll make the change in the code, seems to be:

import tensorflow as tf import numpy as np y = tf.placeholder(tf.float64, [None, 1]) out = tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=y) sess = tf.Session() sess.run(out, {y: np.zeros([0, 1])})

ResourceExhaustedError (see above for traceback): Ran out of GPU memory when allocating 0 bytes for [[Node: SoftmaxCrossEntropyWithLogits = SoftmaxCrossEntropyWithLogits[T=DT_DOUBLE, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Reshape, Reshape)]]

I don not know if its a tensorflow's bug (run out of memory with a size 0 vector allocating 0 bytes) but your solution is real good

RodrigoGantier avatar Jan 12 '18 03:01 RodrigoGantier

Thanks! I'm still struggling to get some results in Coco dataset.

filipetrocadoferreira avatar Jan 12 '18 08:01 filipetrocadoferreira

Hi all, I am also struggling with COCO dataset. I tried another way: ` y_true = tf.gather(target_masks, positive_ix) #shape: [N, Area, kpts] y_pred = tf.gather(pred_masks, positive_ix) #shape:[N, Area, kpts] pred = [] target = [] for ii in range(0, 12): kpt_t = y_true[:, :, ii] # shape: [N, Area] ##Find out all rois contain corresponding kpt y = tf.reduce_sum(kpt_t, axis=1) # If y[k]==0, then kpt[ii] is missing in that roi pos_kpt_ix = tf.where(y > 0)[:, 0] kpt_t = tf.gather(kpt_t, pos_kpt_ix)

    kpt_p = y_pred[:, :, ii]
    kpt_p = tf.gather(kpt_p, pos_kpt_ix)
    target.append(kpt_t)
    pred.append(kpt_p)
    

loss = [] for ii in range(0,12): logits = pred[ii] eps = tf.constant(value=1e-4) labels = tf.to_float(target[ii]) softmax = tf.nn.softmax(logits) + eps cross_entropy = -tf.reduce_sum( labels * tf.log(softmax), reduction_indices=[1]) cross_entropy_mean = K.switch(tf.size(target[ii]) > 0, tf.reduce_mean(cross_entropy), tf.constant(0.0)) loss.append(cross_entropy_mean) loss = tf.stack(loss) `

But this leads to worse result, even the bounding box prediction is wrong.

QtSignalProcessing avatar Jan 12 '18 13:01 QtSignalProcessing

@filipetrocadoferreira do you have an emai or something we can talk? to work in this together my emil is [email protected] please send me a message

RodrigoGantier avatar Jan 13 '18 11:01 RodrigoGantier

Hey, I think we should keep discussions here open in github because everyone can participate.

I'm facing a very slow training. For how long do you train your models?

filipetrocadoferreira avatar Jan 15 '18 10:01 filipetrocadoferreira

@filipetrocadoferreira Does you loss converge finally? My loss doesn't converge and it's confused me these days.

Superlee506 avatar Mar 20 '18 04:03 Superlee506

nop. Finally, I used Detectron..

filipetrocadoferreira avatar Mar 20 '18 09:03 filipetrocadoferreira