image-segmentation-keras icon indicating copy to clipboard operation
image-segmentation-keras copied to clipboard

Prediciton: Inference resolution vs Output resolution

Open GGDRriedel opened this issue 3 years ago • 3 comments

Calling a prediction of a model gives wo different resolutions.

model = vgg_unet(n_classes=2 ,  input_height=1280, input_width=640 )
model.load_weights('myweights1280x640')
outcome=model.predict_segmentation(
                inp='input.png',
                out_fname="out.png")

this results in the array "outcome" wich has exactly half the resolution of the file "out.png" which is 1280x640, so the array is "just" 640x320.

This is very weird behaviour, especially since comparison shows that the out.jpg data seems to be upscaled from the data in the variable outcome.

Why is that?

if it was possible I would like to include an upscaling layer within the model so the result is of exactly the same size as the input, but it seems that for training I can not modify the model after calling model = vgg_unet(n_classes=2 , input_height=1280, input_width=640 )

GGDRriedel avatar Sep 23 '20 13:09 GGDRriedel

The same problem was observed for me, and the encoder_level parameter seems not to work.

puzhao8 avatar Sep 29 '20 19:09 puzhao8

Hello all, I found the same problem and created a new model with another upscaling layer, I just copied&pasted previous layers and it seems to work:

I called it "unet_896" as I am using 896x896 input images:

def _unet_896(n_classes, encoder, l1_skip_conn=True, input_height=416,
          input_width=608): 
    img_input, levels = encoder(
    input_height=input_height, input_width=input_width)
    [f1, f2, f3, f4, f5] = levels
    o = f4
    o = (ZeroPadding2D((1, 1), data_format=IMAGE_ORDERING))(o)
    o = (Conv2D(512, (3, 3), padding='valid' , activation='relu' , data_format=IMAGE_ORDERING))(o)
    o = (BatchNormalization())(o)

    o = (UpSampling2D((2, 2), data_format=IMAGE_ORDERING))(o)
    o = (concatenate([o, f3], axis=MERGE_AXIS))
    o = (ZeroPadding2D((1, 1), data_format=IMAGE_ORDERING))(o)
    o = (Conv2D(256, (3, 3), padding='valid', activation='relu' , data_format=IMAGE_ORDERING))(o)
    o = (BatchNormalization())(o)

    o = (UpSampling2D((2, 2), data_format=IMAGE_ORDERING))(o)
    o = (concatenate([o, f2], axis=MERGE_AXIS))
    o = (ZeroPadding2D((1, 1), data_format=IMAGE_ORDERING))(o)
    o = (Conv2D(128, (3, 3), padding='valid' , activation='relu' , data_format=IMAGE_ORDERING))(o)
    o = (BatchNormalization())(o)

    o = (UpSampling2D((2, 2), data_format=IMAGE_ORDERING))(o)

    if l1_skip_conn:
        o = (concatenate([o, f1], axis=MERGE_AXIS))

    o = (ZeroPadding2D((1, 1), data_format=IMAGE_ORDERING))(o)
    o = (Conv2D(64, (3, 3), padding='valid', activation='relu', data_format=IMAGE_ORDERING))(o)
    o = (BatchNormalization())(o)

    #this is copied from the layers before l1_skip 
    o = (UpSampling2D((2, 2), data_format=IMAGE_ORDERING))(o)
    #o = (concatenate([o, f2], axis=MERGE_AXIS))  This is a skip, removed
    o = (ZeroPadding2D((1, 1), data_format=IMAGE_ORDERING))(o)
    o = (Conv2D(64, (3, 3), padding='valid' , activation='relu' , data_format=IMAGE_ORDERING))(o)  #64 filters (half the previous)
    o = (BatchNormalization())(o)

    o = Conv2D(n_classes, (3, 3), padding='same',
               data_format=IMAGE_ORDERING)(o)

    model = get_segmentation_model(img_input, o)

    return model

acobo avatar Oct 13 '20 20:10 acobo

Nice @acobo

o = (Conv2D(64, (3, 3), padding='valid' , activation='relu' , data_format=IMAGE_ORDERING))(o) #64 filters (half the previous)

I think you meant 32 instead of 64 ? (the layer right before the last one in the original paper also had 64) Anyway, I was playing with this code as well, and I am experiencing difficulties trying to understand the implementation. To me it seems a bit far from the original paper: https://arxiv.org/abs/1505.04597

For instance, the fact that Unet simply doesn't produce an image with the right resolution with vgg16 (ok maybe this was a small mistake that had to do with the fact that unet should be also compatible with other encoder ?)

From the original paper, you'll see that they use upsampling->conv2x2+1(s)->concat ->conv3x3+1(V)->relu ->conv3x3+1(S)->relu for each upsampling stage, generally halving the number of channel in the first conv3x3+1(v).

They were using unpaded convolution (resulting in a smaller image) vs padded convolution here, which I think is a good design choice in a general framework. They were using no batch norm vs conv+relu+bn here. I think that was a good design as well, it was probably omitted in the Unet paper as BN was published the same year (in 2015). What bothers me the most in the end, is why it was decided to use only one conv3x3 instead of 2 like in the original paper. To me it makes the network quite shallow and not really mirroring the complexity of the encoder (3conv for the vgg16).

I would love to hear someone withe extensive experience with various combinations of solutions on those design choices.

gnthibault avatar Nov 17 '20 15:11 gnthibault