Checkerboard artifact free sub-pixel convolution
Would be great to implement the initialization scheme proposed in https://arxiv.org/abs/1707.02937 and recommend it in the TransposedConv2DLayer. Could be a "meta-initializer" that can be used to wrap any of the existing initializers (wrapping GlorotUniform by default). Should tackle #858 on the way (i.e., ensure the gain is chosen correctly -- I think this would come for free).
The implementation would be something like:
class ICNR(Initializer):
def __init__(self, stride, initializer=GlorotUniform()):
self.stride = stride
self.initializer = initializer
def sample(self, shape):
stride = as_tuple(self.stride, len(shape) - 2)
subshape = shape[:2] + tuple(l // s for l, s in zip(shape[2:], stride))
result = self.initializer(subshape)
for d, s in enumerate(stride):
result = np.repeat(result, s, axis=2 + d)
return result[(slice(None), slice(None)) + tuple(slice(None, l) for l in shape[2:])]
The name is taken from the paper, a less cryptic one would be preferred.
@stephenlombardi: Since you're experimenting with convolutional autoencoders, would you like to give this a try?
Using it to train a network now, it seems to work well. Thanks for the heads up!
Great! What I actually meant with "would you like to give this a try" was "would you like to try turning this into a pull request for Lasagne"? :) If my implementation seems to do the right thing, it would mostly be a matter of finding a suitable name, documenting it, and adding tests. See http://lasagne.readthedocs.io/en/latest/user/development.html#how-to-contribute for help. Let me know if you're up to it!
I can give it a shot next week.
@f0k: This is an amazing extension that you just implemented! I've just read the paper and it seems to improve image generation a lot. Could you please explain how to exactly initialize the sub kernels? I don't understand how to "...initialize W0 and then copy the weights to the rest of the sub kernels...", as they explain in the paper. Can I just take a Glorot/Xavier initialized sub kernel and copy this initialization to all the other sub kernels? I want to implement the same initialization scheme in keras/tensorflow :)
Could you please explain how to exactly initialize the sub kernels?
You basically initialize weights for smaller convolution kernels (scaled down by the stride, this is what they refer to as subkernels), and then upscale the kernels to their intended size by nearest neighbor interpolation (= by replicating the pixel values). This ensures the transposed convolution initially behaves like a small convolution followed by nearest neighbor upscaling, and then refines the weights from there.
@lhegen tensorflow implementation is here: https://github.com/kostyaev/ICNR