Semantic-Segmentation-Suite icon indicating copy to clipboard operation
Semantic-Segmentation-Suite copied to clipboard

Different results when checkpoints are restored

Open FSet89 opened this issue 7 years ago • 2 comments
trafficstars

  • What are your command line arguments?: python predict.py --image /path/to/image --checkpoint_path /path/to/checkpoint --model FRRN-A --dataset <dataset_name> --crop_width 256 --crop_height 256

  • Have you written any custom code?: I added is_training=is_training to the batch_norm parameters in the model builder, which is True during training and False during prediction

  • What have you done to try and solve this issue?: I tried to change the image loading pipeline without success

  • TensorFlow version?: 1.7.0

Describe the problem

While the validation images are correctly segmented at training time, the same images are not correctly segmented when I run the predict script (i.e. when the checkpoints are restored). I tried both the latest checkpoint and a previous one.

FSet89 avatar Nov 07 '18 10:11 FSet89

This sounds like it's almost certainly a batch norm thing...the default batch size is one, and 256x256 isn't huge, so it's possible test-time batch norm statistics aren't closely matching the normalization it does while in training. I don't think anything is wrong with the checkpoint.

I'm not the author, so, you know, but the very first thing I would try is increasing the batch size. Since these are fully convolutional models, even increasing the batch size to 4 or something might make the batch norm stable, bringing your training and testing results closer together.

CJMenart avatar Nov 08 '18 13:11 CJMenart

What do you mean "correctly segmented"?

If the output image at the prediction stage is totally wrong,

Did you try to re-write

update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):

before loss definition at train.py? Ref: https://stackoverflow.com/questions/41666964/model-variables-in-tensorflows-batch-norm

Or you can just change is_training=False to True in predict.py (but I do not recommend to do this...)

ryohachiuma avatar Dec 21 '18 07:12 ryohachiuma