dlib
dlib copied to clipboard
DNN Support for dilated convolutions
Hello all! First I want to send my gratitude to all the people working on this great library. It is one of my favourite ML/image processing libraries out there. I specially like the DNN part of it it is so much easier to use then some of the others out there and that you can drive it directly in C++ is great! However I did notice there are some holes in functionality and I was wonder in particular if there is any future plan to add dilated convolutions option to the current DNN convolution layer. I hacked in some support myself to see if they do make a difference so I changed the dnn_segmentation example to add them. Unfortunately my hack would require all examples to be rewritten as there are now two additional parameters in the con layer.
Here are some example images using a trained head segmentation dataset testing it on some celebA images with and without dilation and you can see some decent improvements by just using dilation (that's the only change I didn't touch the network structure itself so far)
Non dilated example 1 Dilated example 1 Non dilated example2 Dilated example 2 Non dilated example3 Dilated example 3 Non dilated example4 Dilated example 4
Thanks, glad you like dlib :)
I'm not sure, I think @OranjeeGeneral was maybe adding one? It's definitely something that would be nice to have.
Hello Davis thanks for your answer. I see. well I can offer my changes if that helps but as I said I don't see how they can be added without having to change every example out there as I had to add two template parameters to the con layer and unlike methods/functions you can not have const literal values in templates. This makes it sound like a big breaking change. Also it only works with CuDNN 6 and higher.
Right, don't submit that in a PR. If there really is no way to upgrade the con layer without breaking existing stuff then the right thing to do is to just add a new layer type for this. Although I would expect you could just add two new optional template arguments to con. You can certainly have default literal values in templates. The existing con layer already has them even.
Ah silly me of course I just have to move the new parameters to the end of the template definition because they are the only ones with default values then and then it works without having to change the existing examples and so on.
But the other issue that it works only with CuDNN >=6 still remains. As only with these version they changed the convolution layout to support it, is that an issue? Problem I think would be that PREPROCESSOR definition is currently not reachable in layers.h
That's fine. People can get a runtime error if they pass a dilation value other than 1 and are using an old version of cuDNN.
I haven't forgotten this but I was thinking to put in a pull request once I have done some other changes as well to improve the pixel segmentation neural networks for example I think the asserts in the per_pixel_log function are too tight as it currently allows for only a very limited number of input training sizes especially since in decoder/encoder networks you can't really control that the decoder part returns the same RxC dimension for the output as it was for the input, I think this needs to be better handled otherwise it is a bit useless.
You could use a final layer that resizes the output tensor to whatever size you wanted. So for instance, you could trivially make a layer like the upsample layer that resized to some use specified output size, or even just to the input size or whatever.
Also, what would you do in the loss if the output tensor isn't the same size as the input tensor? How would you compute the loss? It seems like the most sensible thing to do would be to interpolate the output to the input size in the loss. But if that's what we are talking about then you might as well just add a final layer that did that and leave the current loss alone since it's much simpler to understand when there is a one-to-one mapping between output and inputs.
I also generally prefer pull requests that address one issue at a time rather than huge pull requests that do a bunch of things at once. The smaller ones are easier to review.
Okay I will leave it out then.
Well you could use a resize op I guess but then you would have to change the layout of the evaluation network or put an option in to ignore the resize. Resize op would have to take the label image input size as often it happens that one dimension of the output image is smaller and the other larger than the label image. Alternatively if you do it in the loss function you could either switch to one of the ignore labels or for example the background label for data that isn't covered in the label map
Aren't you training with mini-batches where the images are all the same size? You should know the size during training, making setting the output size easy since you know it.
Also, you can make a resize layer that simply forces the output to be the same size as the input. Then there isn't anything to set since the layer can just look at the input layer to see the right dimensions.
I'm also very interested of having a dilated convolution feature.
I'm looking at the code and it looks easy, just want to make sure: Basically, the only thing to do is here: cudnn_dlibapi.cpp +875, add some int dilate_x, dilate_y parameters exposed in the con_ layer, right?
So basically instead of having this:
CHECK_CUDNN(cudnnSetConvolution2dDescriptor((cudnnConvolutionDescriptor_t)conv_handle,
padding_y, // vertical padding
padding_x, // horizontal padding
stride_y,
stride_x,
1, 1, // must be 1,1
CUDNN_CROSS_CORRELATION)); // could also be CUDNN_CONVOLUTION
I would need something like:
CHECK_CUDNN(cudnnSetConvolution2dDescriptor((cudnnConvolutionDescriptor_t)conv_handle,
padding_y, // vertical padding
padding_x, // horizontal padding
stride_y,
stride_x,
dilate_y, dilate_x,
CUDNN_CROSS_CORRELATION)); // could also be CUDNN_CONVOLUTION
Am i missing something or is this the idea?
Thanks,
@edubois
Unfortunately it isn't that easy as that. First this only adds support to CUDA path so you need to find a default backup for CPU the easiest choice would be only support dilation factors to be 1,1. Also you need to add another parameter to the layer so and add defaults otherwise all existing network definition would stop working when encountering this new conv layer
But it gets worse as unfortunately dilation also has an impact on the padding of the convolution layer, the padding needs to be enlarged by the dilation factor and here the real problem comes in. As now you can no longer use CUDA trained models with NON-CUDA. As the serialization of the model doesn't match up.
So it is all a bit of a mess. I guess the best solution is to add CPU support for dilation as well and then all these problems go away.
Yes, the cuda side is easy. But a complete PR needs to include a CPU implementation.
Well I guess if speed isn't such a major concern the easiest way for the CPU implementation would be to enlarge the convolution kernel with zeros. But dilation I realized now is pretty crucial to improve quality for segmentation networks especially on the lower layers (have a look at DeepLab for example) even though the folks at google call it Atrous Convolution.
Don't do that. Speed isn't a huge deal, but let's not be crazy :)