ICNet-tensorflow icon indicating copy to clipboard operation
ICNet-tensorflow copied to clipboard

Implement Validation In train.py

Open BCJuan opened this issue 6 years ago • 12 comments

HI @hellochick, I am trying the following but without any luck: obtain validation loss results while the model is training.

I have tried to feed the network a batch of images from my validation dataset as in

net = ICNet_BN({'data': image_batch}, is_training=False, num_classes=args.num_classes, filter_scale=args.filter_scale)

where now 'data': image_batch_validation but then it says that the variables already exist. I also have tried to call net and feed data to it but it says that net is not callable.

I do not know how can I obtain at each step results of loss and metrics in the validation dataset, with the current weights of the network at that step and without training.

So the main problem is with feeding the validation data to the net.

Hope I have explained myself properly.

Thank you in advance.

best

BCJuan avatar May 10 '18 17:05 BCJuan

Hi, If anyone is interested I managed to include validation a few weeks ago. I made it in a dirty way, creating another Net Object for validation and copying weights to it.

Also made changes to:

  • argument parser to include option for doing validation or not. Just use it as --validation true
  • function for copying structure
  • another data reader for validation data
  • new ICNet network object under name scope 'val'
  • changed trainable variables and l2 loss to avoid having into account trainable variable

Validation loss is calculated each time the model is saved.

I think this is all.

As you might know this should be interesting to know if you are overfitting or not.

Best

As I cannot insert the code file I will put here the code changes:

  • Function for copying
def print_assign_vars(sess):
    for v in tf.global_variables():
        if "val" in v.name:
            n_name = v.name.split("/")
            f_name = "/".join(n_name[1:])
            for l in tf.trainable_variables():
                if f_name == l.name:
                    sess.run(v.assign(l))
  • new argument in parser
    parser.add_argument('--validation', type=str2bool, nargs='?',const=True, default=VALIDATION,
                        help='To make validation')

with this function

def str2bool(v):
    if v.lower() in ('yes', 'true', 't', 'y', '1'):
        return True
    elif v.lower() in ('no', 'false', 'f', 'n', '0'):
        return False
    else:
        raise argparse.ArgumentTypeError('Boolean value expected.')
  • The new reader
        reader_2 = ImageReader(
            DATA_DIR_2,
            DATA_LIST_PATH_2,
            input_size,
            args.random_scale,
            args.random_mirror,
            args.ignore_label,
            IMG_MEAN,
            coord)
        image_batch_val, label_batch_val = reader_2.dequeue(args.batch_size)
    
  • the new net
 with tf.variable_scope("val"):
         net_val = ICNet_BN({'data': image_batch_val}, is_training=True, num_classes=args.num_classes, filter_scale=args.filter_scale)
        
  • changes in trainable variables and l2 lossses
all_trainable = [v for v in tf.trainable_variables() if ('beta' not in v.name and 'gamma' not in v.name and 'val' not in v.name) or args.train_beta_gamma ]

    l2_losses = [args.weight_decay * tf.nn.l2_loss(v) for v in tf.trainable_variables() if ('weights' in v.name and 'val' not in v.name)]
  • loss calculation
    #######################FOR VALIDATION
    
    sub4_out_val = net_val.layers['sub4_out']
    sub24_out_val = net_val.layers['sub24_out']
    sub124_out_val = net_val.layers['conv6_cls']
    
    loss_sub4_val = create_loss(sub4_out_val, label_batch_val, args.num_classes, args.ignore_label)
    loss_sub24_val = create_loss(sub24_out_val, label_batch_val, args.num_classes, args.ignore_label)
    loss_sub124_val = create_loss(sub124_out_val, label_batch_val, args.num_classes, args.ignore_label)
    l2_losses_val = [args.weight_decay * tf.nn.l2_loss(v) for v in tf.trainable_variables() if ('weights' in v.name and 'val' in v.name)]
    
    reduced_loss_val = LAMBDA1 * loss_sub4_val +  LAMBDA2 * loss_sub24_val + LAMBDA3 * loss_sub124_val + tf.add_n(l2_losses_val)

And that's it. Hope it helps.

If anyone is interested I can send the code.

BCJuan avatar May 28 '18 16:05 BCJuan

@BCJuan have you tried to train this on your own dataset?

PratibhaT avatar Jun 18 '18 22:06 PratibhaT

@PratibhaT Yes, I have tried. Attached I leave an image where mIoU is shown as well as the training and validation loss (green for validation and blue for training) loss_pic

BCJuan avatar Jun 19 '18 09:06 BCJuan

@BCJuan Can you suggest me how to prepare the data for it. I've labeled my dataset so I've one .json file having polygon points for whole training set (using VIA annotation tool). But while going through list.txt, I found that the labels are referred to as a bitmap .png image. So, can you tell me do I need to prepare my data and labels in similar format, If yes how? and if not how can I directly train with images and .json file labels?

PratibhaT avatar Jun 19 '18 18:06 PratibhaT

You should have a .txt, with two columns in the first images for input, and in the second and separated by a space, the labels in .png format. If what you are asking for is to covert .json files to images, I cannot help you, but I am pretty sure that you will find dozens of snippets over the Internet that do that . Best

BCJuan avatar Jun 20 '18 10:06 BCJuan

Nice work @BCJuan ! I would like to implement your solution as well (I train with my own dataset too). Could you please show me your code where call the sess.run(...) for the val net? How to you calc the mIoU during training?

alexw92 avatar Jun 23 '18 13:06 alexw92

Hi @alexw92

Where you have the sess.run(reduced_loss,...) add something like

if args.validation:
     sess.run(redcued_loss_val)

This is to add validation loss and to be able to record it.

For miou put a statement after the outputs of the layers like:

mIoU, update_op_m = tf.metrics.mean_iou( good_label_re, good_pred_re, num_classes=args.num_classes)

where good_label_re, and ```good_pred_re````are just the outputs prepared for evalutaion of the metric. You have examples of this both in the training code and in the evaluation code.

BCJuan avatar Jun 23 '18 15:06 BCJuan

@BCJuan Thank you, I found the lines in evaluation and it should be easy to get this working.

Did you call the sess.run(redcued_loss_val) in a loop running num_val_images/batch_size times in order to validate the model with the whole validation set? I would like to iterate other the whole val set similiar to the code in evaluate.py but don't know how to realize this using ImageReader.

alexw92 avatar Jun 23 '18 15:06 alexw92

Nice point @alexw92

No, I do not iterate through the whole validation dataset, just a batch of it. I do not really know how to pass the entire data set. I think you would have to mount another reader. I just do not know. But good point. I will appreciate it very much if you achieve it and let me know about it. Thank u.

BCJuan avatar Jun 23 '18 16:06 BCJuan

Hey guys, I would suggest that you can usetf.Dataset API to validate the model during training. I will try to update and clean the code in recent days. Thank @BCJuan for solving this problem!

hellochick avatar Jun 23 '18 16:06 hellochick

Hi @hellochick

Thank you much for the suggestion and the code. I'll give a look at it and see what can I get, then I'll post.

BCJuan avatar Jun 23 '18 16:06 BCJuan

@BCJuan I am very interested in this code. Can you send the code to this email? [email protected].

yeyuanzheng177 avatar Jun 27 '18 08:06 yeyuanzheng177