pytorch-semseg icon indicating copy to clipboard operation
pytorch-semseg copied to clipboard

Size inconsistency in U-Net implementation.

Open xiaofengqing opened this issue 8 years ago • 17 comments

When i train the unet model,i got this error: RuntimeError: inconsistent tensor sizes at/b/wheel/pytorchsrc/torch/lib/THC/generic/THCTensorMath.cu:141

my input image size is 256*256

xiaofengqing avatar Nov 23 '17 13:11 xiaofengqing

I also have the exact same issue. Can anyone help me out ?

shehabk avatar Nov 25 '17 22:11 shehabk

def getitem(self, index): img_name = self.files[self.split][index] img_path = self.root + '/' + self.split + '/' + img_name lbl_path = self.root + '/' + self.split + 'annot/' + img_name print img_path print lbl_path img = m.imread(img_path) img=m.imresize(img,[360, 480], interp='nearest') # add this line
img = np.array(img, dtype=np.uint8)

    lbl = m.imread(lbl_path)
    lbl=m.imresize(lbl,[360, 480], interp='nearest')    # add  this line  
    lbl = np.array(lbl, dtype=np.int32)
    print lbl.shape

hexiangquan avatar Nov 30 '17 09:11 hexiangquan

This resizing of image did not work for me. I still have the same error. Does this current implementation of unet work with (256,256) ? If not what size of image should be used ?

shehabk avatar Dec 06 '17 17:12 shehabk

I have the same problem. Did anyone find the solution?

bobbqe avatar Dec 07 '17 13:12 bobbqe

The problem is that unet does not have any padding in the convolution layers. So output size is not equal to input size. But the label size = input size.

mileyan avatar Dec 22 '17 18:12 mileyan

I'm aware of this issue, U-net implementation doesn't support all resolutions. I need to fix this.

meetps avatar Dec 28 '17 16:12 meetps

setting padding to 1 instead of 0 worked for me.

masahi avatar Jan 14 '18 23:01 masahi

@masahi OMG.. you are the winner.. It works fine but I should see the result images after training.

JustWon avatar Jan 16 '18 07:01 JustWon

@masahi After training the unet, I performed the validate.py but the following error occurred.

image

JustWon avatar Jan 17 '18 01:01 JustWon

@JustWon that error is not related to your change in padding. look elsewhere.

masahi avatar Jan 17 '18 03:01 masahi

Maybe late to the discussion, but since I've PR'd the u-net fix (#35), see issue #21), Here's my comments.

A strict U-net implementation does not use padding (Fig 1 in the https://arxiv.org/pdf/1505.04597.pdf), which is the reason the padding=0 instead of 1. Several other implementations follow this (TF#1, TF#2, note the "valid" padding). So the input size should be 572x572, and the output size should be 388x388.

So an easiest method would be resizing the input & output images to match respective sizes.

Using the padding wouldn't hurt since it nicely keeps the size, but it is not an exact architecture from the paper so use it as you own risk regarding to proper benchmarks.

A quick "fix" would be raising a readable error so as to match the I/O size, or giving an on/off switch for the padding.

L0SG avatar May 03 '18 08:05 L0SG

@L0SG Hi, thanks for your explanation.

I am confused about the input size and output size. According to the paper, it uses the overlap-tile strategy for segmentation of arbitrary large images. Does it mean that we shouldn't resize the label image but select part of the label image(388 x 388) and mirror the real image(388 x 388 -> 572 x 572) ?

I am new to segmentation. Does the effect of changing the label size to the final accuracy is little? By the way, when we do data augmentation, should we use different resize method to input image/label? (https://github.com/pytorch/vision/issues/9#issuecomment-294629198 said the input image uses bilinear while the label uses neirest-neighbour)

irexyc avatar Jul 18 '18 09:07 irexyc

@irexyc Yes you're right. For the net to utilize the "valid" padding strategy of convolutions, you may want to tile the (388x388) image to have a shape of 572x572 like Fig.2 from the paper (the word "resize" of my previous comment is kind of a misnomer here, and I use the model with the "tiled" CT scan images). This shows an example with mirror-padding. This may further clarify the I/O.

I think the "bilnear for input image & nearest-neighbor for binary segmentation mask" is a general practice since bilinear provides more natural & smooth interpolation for images and we want to keep the mask binary and not interpolating it.

L0SG avatar Jul 19 '18 00:07 L0SG

setting padding to 1 instead of 0 worked for me.

Hello,i meet the same problems! How i set padding to 1?

lfdeep avatar Nov 14 '18 10:11 lfdeep

Can any one find the solution ? Please help me i'm new on machine learning and getting the same error.

shariq-ali avatar Nov 29 '19 11:11 shariq-ali

TL;DR: Size inconsistency is NOT an issue of the U-Net implementation for the original version from the paper referenced above. The original paper used a mirror-tile strategy for input images to yield a desired output dimension.

Source: https://arxiv.org/pdf/1505.04597.pdf image

alar0330 avatar Mar 09 '20 11:03 alar0330

@lfdeep

Change padding in lines 174-183 in utils.py, unetConv2 function

if is_batchnorm:
            self.conv1 = nn.Sequential(
                nn.Conv2d(in_size, out_size, 3, 1, 1), nn.BatchNorm2d(out_size), nn.ReLU()
            )
            self.conv2 = nn.Sequential(
                nn.Conv2d(out_size, out_size, 3, 1, 1), nn.BatchNorm2d(out_size), nn.ReLU()
            )
        else:
            self.conv1 = nn.Sequential(nn.Conv2d(in_size, out_size, 3, 1, 1), nn.ReLU())
            self.conv2 = nn.Sequential(nn.Conv2d(out_size, out_size, 3, 1, 1), nn.ReLU())

Make sure to check with the summary function that this is what you want to do.

ckolluru avatar May 16 '20 20:05 ckolluru