tensorflow-yolo icon indicating copy to clipboard operation
tensorflow-yolo copied to clipboard

AssertionError: Model diverged with loss = NaN

Open XuanheLiu opened this issue 6 years ago • 9 comments

No loading yolo_tiny.ckpt. Direct training yolo_net, it will occur error :AssertionError: Model diverged with loss = NaN.

XuanheLiu avatar Nov 07 '17 11:11 XuanheLiu

@XuanheLiu Did you solve the problem? I met the same

wenbowen123 avatar Nov 25 '17 18:11 wenbowen123

@wenbowen123 The problem has been solved, the learning of the first small, and so on loss value down, and then increase, and then turn smaller.

XuanheLiu avatar Nov 26 '17 04:11 XuanheLiu

@XuanheLiu Thank you! How does the trained result looks like? How is the accuracy?

wenbowen123 avatar Nov 26 '17 04:11 wenbowen123

@wenbowen123 I don't really know how to train. The weight of my training is not as good as that of the author. I remember the value of loss didn't drop to very low, and I don't remember how much it was.

XuanheLiu avatar Nov 26 '17 05:11 XuanheLiu

@wenbowen123 so how you solve the error i run into the same problem, thanks a lot

ghost avatar Jan 11 '18 06:01 ghost

The model diverges if the training process changes the weights too much and the loss becomes larger or rather extremely huge. Try changing the standard distribution of the weight variables when initialized and the constant value of the bias variables so that the initial biases and weights are relatively small. You can also consider changing the learning rate. Model divergence is as far as I know caused by those (hyper-)paramters. Cheers

Fju avatar Feb 26 '18 17:02 Fju

assert not np.isnan(loss_value), 'Model diverged with loss = NaN' AssertionError: Model diverged with loss = NaN

When I use Python3 to run this project, the trainer will be NaN, but when I use Python2 to run this project, the model was convergence. @XuanheLiu @Fju @nilboy

liuguiyangnwpu avatar Mar 07 '18 03:03 liuguiyangnwpu

Does it make any sense that it works in Python 2 not in Python 3?????????

adr-arroyo avatar May 25 '18 07:05 adr-arroyo

You need to compute and apply gradient seperately by the following process:

opt = tf.train.AdamOptimizer(0.1)
gvs = opt.compute_gradients(logits)
capped_gvs = [(tf.clip_by_value(grad, -1., 1.), var) for grad, var in gvs]
train_op = optimizer.apply_gradients(capped_gvs)

It can limit the range of computed gradient and prevent model from diverge.

hwade avatar Sep 04 '18 05:09 hwade