Sander Dieleman
Sander Dieleman
I think the general consensus is that local response normalization is sort of pointless. To the best of my knowledge people aren't really using this anymore. That said, if you...
> The downside of 1 is that it reduces our ability to infer shapes of intermediate variables in the graph, and it makes the expressions specifying the shapes more complicated....
> Then, the elementwise operators like **add** will look at their arguments for presence of broadcast_mask, and if it exists, they can behave like the cgt.broadcast function on their own...
I second that :) Can't wait to try this out! With the pervasiveness of 3x3 convolutions nowadays, this could be a game changer.
Awesome, looking forward to that!
Although the CuDNN results in this benchmark are not too spectacular, I tried it out myself today with Theano, and compared it with some other available implementations (legacy conv2d, fftconv,...
Interesting, from those results the conclusion would be pretty different, if I'm reading them correctly. I wonder why it makes such a big difference for what I tried (very simple...
@stencilman I see, so it's just using the forward pass implementation for all three operations (forward, backward w.r.t. weights, backward w.r.t. input). But that doesn't explain why it seems to...
> Also, having HWN contiguous means that you can do 1x1 conv super efficiently in a basic gemm kernel. The same goes for NHWC though, right? If I'm not mistaken,...
> The issue we're having with cuDNN with 1x1 is that NCHW has the C "in the middle". Today, our direct convolution is similar to 3x3 or 5x5 for 1x1...