ICNet-tensorflow Implement Validation In train.py

HI @hellochick, I am trying the following but without any luck: obtain validation loss results while the model is training.

I have tried to feed the network a batch of images from my validation dataset as in

net = ICNet_BN({'data': image_batch}, is_training=False, num_classes=args.num_classes, filter_scale=args.filter_scale)

where now 'data': image_batch_validation but then it says that the variables already exist. I also have tried to call net and feed data to it but it says that net is not callable.

I do not know how can I obtain at each step results of loss and metrics in the validation dataset, with the current weights of the network at that step and without training.

So the main problem is with feeding the validation data to the net.

Hope I have explained myself properly.

Thank you in advance.

best

May 10 '18 17:05 BCJuan

Hi, If anyone is interested I managed to include validation a few weeks ago. I made it in a dirty way, creating another Net Object for validation and copying weights to it.

Also made changes to:

argument parser to include option for doing validation or not. Just use it as --validation true
function for copying structure
another data reader for validation data
new ICNet network object under name scope 'val'
changed trainable variables and l2 loss to avoid having into account trainable variable

Validation loss is calculated each time the model is saved.

I think this is all.

As you might know this should be interesting to know if you are overfitting or not.

Best

As I cannot insert the code file I will put here the code changes:

Function for copying

def print_assign_vars(sess):
    for v in tf.global_variables():
        if "val" in v.name:
            n_name = v.name.split("/")
            f_name = "/".join(n_name[1:])
            for l in tf.trainable_variables():
                if f_name == l.name:
                    sess.run(v.assign(l))

new argument in parser

    parser.add_argument('--validation', type=str2bool, nargs='?',const=True, default=VALIDATION,
                        help='To make validation')

with this function

def str2bool(v):
    if v.lower() in ('yes', 'true', 't', 'y', '1'):
        return True
    elif v.lower() in ('no', 'false', 'f', 'n', '0'):
        return False
    else:
        raise argparse.ArgumentTypeError('Boolean value expected.')

The new reader

        reader_2 = ImageReader(
            DATA_DIR_2,
            DATA_LIST_PATH_2,
            input_size,
            args.random_scale,
            args.random_mirror,
            args.ignore_label,
            IMG_MEAN,
            coord)
        image_batch_val, label_batch_val = reader_2.dequeue(args.batch_size)

the new net

 with tf.variable_scope("val"):
         net_val = ICNet_BN({'data': image_batch_val}, is_training=True, num_classes=args.num_classes, filter_scale=args.filter_scale)

changes in trainable variables and l2 lossses

all_trainable = [v for v in tf.trainable_variables() if ('beta' not in v.name and 'gamma' not in v.name and 'val' not in v.name) or args.train_beta_gamma ]

    l2_losses = [args.weight_decay * tf.nn.l2_loss(v) for v in tf.trainable_variables() if ('weights' in v.name and 'val' not in v.name)]

loss calculation

    #######################FOR VALIDATION
    
    sub4_out_val = net_val.layers['sub4_out']
    sub24_out_val = net_val.layers['sub24_out']
    sub124_out_val = net_val.layers['conv6_cls']
    
    loss_sub4_val = create_loss(sub4_out_val, label_batch_val, args.num_classes, args.ignore_label)
    loss_sub24_val = create_loss(sub24_out_val, label_batch_val, args.num_classes, args.ignore_label)
    loss_sub124_val = create_loss(sub124_out_val, label_batch_val, args.num_classes, args.ignore_label)
    l2_losses_val = [args.weight_decay * tf.nn.l2_loss(v) for v in tf.trainable_variables() if ('weights' in v.name and 'val' in v.name)]
    
    reduced_loss_val = LAMBDA1 * loss_sub4_val +  LAMBDA2 * loss_sub24_val + LAMBDA3 * loss_sub124_val + tf.add_n(l2_losses_val)

And that's it. Hope it helps.

If anyone is interested I can send the code.

May 28 '18 16:05 BCJuan

@BCJuan have you tried to train this on your own dataset?

Jun 18 '18 22:06 PratibhaT

@PratibhaT Yes, I have tried. Attached I leave an image where mIoU is shown as well as the training and validation loss (green for validation and blue for training) loss_pic

Jun 19 '18 09:06 BCJuan

@BCJuan Can you suggest me how to prepare the data for it. I've labeled my dataset so I've one .json file having polygon points for whole training set (using VIA annotation tool). But while going through list.txt, I found that the labels are referred to as a bitmap .png image. So, can you tell me do I need to prepare my data and labels in similar format, If yes how? and if not how can I directly train with images and .json file labels?

Jun 19 '18 18:06 PratibhaT

You should have a .txt, with two columns in the first images for input, and in the second and separated by a space, the labels in .png format. If what you are asking for is to covert .json files to images, I cannot help you, but I am pretty sure that you will find dozens of snippets over the Internet that do that . Best

Jun 20 '18 10:06 BCJuan

Nice work @BCJuan ! I would like to implement your solution as well (I train with my own dataset too). Could you please show me your code where call the sess.run(...) for the val net? How to you calc the mIoU during training?

Jun 23 '18 13:06 alexw92

Hi @alexw92

Where you have the sess.run(reduced_loss,...) add something like

if args.validation:
     sess.run(redcued_loss_val)

This is to add validation loss and to be able to record it.

For miou put a statement after the outputs of the layers like:

mIoU, update_op_m = tf.metrics.mean_iou( good_label_re, good_pred_re, num_classes=args.num_classes)

where good_label_re, and ```good_pred_re````are just the outputs prepared for evalutaion of the metric. You have examples of this both in the training code and in the evaluation code.

Jun 23 '18 15:06 BCJuan

@BCJuan Thank you, I found the lines in evaluation and it should be easy to get this working.

Did you call the sess.run(redcued_loss_val) in a loop running num_val_images/batch_size times in order to validate the model with the whole validation set? I would like to iterate other the whole val set similiar to the code in evaluate.py but don't know how to realize this using ImageReader.

Jun 23 '18 15:06 alexw92

Nice point @alexw92

No, I do not iterate through the whole validation dataset, just a batch of it. I do not really know how to pass the entire data set. I think you would have to mount another reader. I just do not know. But good point. I will appreciate it very much if you achieve it and let me know about it. Thank u.

Jun 23 '18 16:06 BCJuan

Hey guys, I would suggest that you can usetf.Dataset API to validate the model during training. I will try to update and clean the code in recent days. Thank @BCJuan for solving this problem!

Jun 23 '18 16:06 hellochick

Hi @hellochick

Thank you much for the suggestion and the code. I'll give a look at it and see what can I get, then I'll post.

Jun 23 '18 16:06 BCJuan

@BCJuan I am very interested in this code. Can you send the code to this email? [email protected].

Jun 27 '18 08:06 yeyuanzheng177

ICNet-tensorflow ICNet-tensorflow copied to clipboard

Implement Validation In train.py

ICNet-tensorflow
ICNet-tensorflow copied to clipboard