dlib icon indicating copy to clipboard operation
dlib copied to clipboard

DNN Support for dilated convolutions

Open Mut1nyJD opened this issue 7 years ago • 15 comments

Hello all! First I want to send my gratitude to all the people working on this great library. It is one of my favourite ML/image processing libraries out there. I specially like the DNN part of it it is so much easier to use then some of the others out there and that you can drive it directly in C++ is great! However I did notice there are some holes in functionality and I was wonder in particular if there is any future plan to add dilated convolutions option to the current DNN convolution layer. I hacked in some support myself to see if they do make a difference so I changed the dnn_segmentation example to add them. Unfortunately my hack would require all examples to be rewritten as there are now two additional parameters in the con layer.

Here are some example images using a trained head segmentation dataset testing it on some celebA images with and without dilation and you can see some decent improvements by just using dilation (that's the only change I didn't touch the network structure itself so far)

Non dilated example 1 Dilated example 1 Non dilated example2 Dilated example 2 Non dilated example3 Dilated example 3 Non dilated example4 Dilated example 4

Mut1nyJD avatar Dec 17 '17 10:12 Mut1nyJD

Thanks, glad you like dlib :)

I'm not sure, I think @OranjeeGeneral was maybe adding one? It's definitely something that would be nice to have.

davisking avatar Dec 17 '17 13:12 davisking

Hello Davis thanks for your answer. I see. well I can offer my changes if that helps but as I said I don't see how they can be added without having to change every example out there as I had to add two template parameters to the con layer and unlike methods/functions you can not have const literal values in templates. This makes it sound like a big breaking change. Also it only works with CuDNN 6 and higher.

Mut1nyJD avatar Dec 18 '17 13:12 Mut1nyJD

Right, don't submit that in a PR. If there really is no way to upgrade the con layer without breaking existing stuff then the right thing to do is to just add a new layer type for this. Although I would expect you could just add two new optional template arguments to con. You can certainly have default literal values in templates. The existing con layer already has them even.

davisking avatar Dec 18 '17 14:12 davisking

Ah silly me of course I just have to move the new parameters to the end of the template definition because they are the only ones with default values then and then it works without having to change the existing examples and so on.

But the other issue that it works only with CuDNN >=6 still remains. As only with these version they changed the convolution layout to support it, is that an issue? Problem I think would be that PREPROCESSOR definition is currently not reachable in layers.h

Mut1nyJD avatar Dec 20 '17 10:12 Mut1nyJD

That's fine. People can get a runtime error if they pass a dilation value other than 1 and are using an old version of cuDNN.

davisking avatar Dec 20 '17 10:12 davisking

I haven't forgotten this but I was thinking to put in a pull request once I have done some other changes as well to improve the pixel segmentation neural networks for example I think the asserts in the per_pixel_log function are too tight as it currently allows for only a very limited number of input training sizes especially since in decoder/encoder networks you can't really control that the decoder part returns the same RxC dimension for the output as it was for the input, I think this needs to be better handled otherwise it is a bit useless.

Mut1nyJD avatar Jan 07 '18 16:01 Mut1nyJD

You could use a final layer that resizes the output tensor to whatever size you wanted. So for instance, you could trivially make a layer like the upsample layer that resized to some use specified output size, or even just to the input size or whatever.

Also, what would you do in the loss if the output tensor isn't the same size as the input tensor? How would you compute the loss? It seems like the most sensible thing to do would be to interpolate the output to the input size in the loss. But if that's what we are talking about then you might as well just add a final layer that did that and leave the current loss alone since it's much simpler to understand when there is a one-to-one mapping between output and inputs.

davisking avatar Jan 07 '18 19:01 davisking

I also generally prefer pull requests that address one issue at a time rather than huge pull requests that do a bunch of things at once. The smaller ones are easier to review.

davisking avatar Jan 07 '18 19:01 davisking

Okay I will leave it out then.

Well you could use a resize op I guess but then you would have to change the layout of the evaluation network or put an option in to ignore the resize. Resize op would have to take the label image input size as often it happens that one dimension of the output image is smaller and the other larger than the label image. Alternatively if you do it in the loss function you could either switch to one of the ignore labels or for example the background label for data that isn't covered in the label map

Mut1nyJD avatar Jan 08 '18 21:01 Mut1nyJD

Aren't you training with mini-batches where the images are all the same size? You should know the size during training, making setting the output size easy since you know it.

Also, you can make a resize layer that simply forces the output to be the same size as the input. Then there isn't anything to set since the layer can just look at the input layer to see the right dimensions.

davisking avatar Jan 09 '18 00:01 davisking

I'm also very interested of having a dilated convolution feature.

I'm looking at the code and it looks easy, just want to make sure: Basically, the only thing to do is here: cudnn_dlibapi.cpp +875, add some int dilate_x, dilate_y parameters exposed in the con_ layer, right?

So basically instead of having this:

CHECK_CUDNN(cudnnSetConvolution2dDescriptor((cudnnConvolutionDescriptor_t)conv_handle,
                        padding_y, // vertical padding
                        padding_x, // horizontal padding
                        stride_y,
                        stride_x,
                        1, 1, // must be 1,1
                        CUDNN_CROSS_CORRELATION)); // could also be CUDNN_CONVOLUTION

I would need something like:

                CHECK_CUDNN(cudnnSetConvolution2dDescriptor((cudnnConvolutionDescriptor_t)conv_handle,
                        padding_y, // vertical padding
                        padding_x, // horizontal padding
                        stride_y,
                        stride_x,
                        dilate_y, dilate_x,
                        CUDNN_CROSS_CORRELATION)); // could also be CUDNN_CONVOLUTION

Am i missing something or is this the idea?

Thanks,

edubois avatar Mar 06 '18 22:03 edubois

@edubois

Unfortunately it isn't that easy as that. First this only adds support to CUDA path so you need to find a default backup for CPU the easiest choice would be only support dilation factors to be 1,1. Also you need to add another parameter to the layer so and add defaults otherwise all existing network definition would stop working when encountering this new conv layer

But it gets worse as unfortunately dilation also has an impact on the padding of the convolution layer, the padding needs to be enlarged by the dilation factor and here the real problem comes in. As now you can no longer use CUDA trained models with NON-CUDA. As the serialization of the model doesn't match up.

So it is all a bit of a mess. I guess the best solution is to add CPU support for dilation as well and then all these problems go away.

Mut1nyJD avatar Mar 18 '18 14:03 Mut1nyJD

Yes, the cuda side is easy. But a complete PR needs to include a CPU implementation.

davisking avatar Mar 18 '18 15:03 davisking

Well I guess if speed isn't such a major concern the easiest way for the CPU implementation would be to enlarge the convolution kernel with zeros. But dilation I realized now is pretty crucial to improve quality for segmentation networks especially on the lower layers (have a look at DeepLab for example) even though the folks at google call it Atrous Convolution.

Mut1nyJD avatar Mar 18 '18 21:03 Mut1nyJD

Don't do that. Speed isn't a huge deal, but let's not be crazy :)

davisking avatar Mar 18 '18 21:03 davisking