Andrew Lavin

Results 64 comments of Andrew Lavin

@nouiz L1 is clearly not a middle layer, so the only necessary change is to remove the gradInput calculation from the L1 results.

@soumith Why have the cuDNN R4 Googlenet-1 numbers changed?

OK, thanks, I wasn't expecting the Neon Googlenet-1 speedup to decrease. ;-) So these numbers make more sense.

That's great, Julien. Can't wait to see Winograd/Cook/Toom in cuDNN. :-) It has been almost a year since I discovered this new approach to convnet acceleration, and it is great...

Now if you guys could put your heads together and figure out a way to end the NCHW vs CHWN vs NHWC wars. There must be some way to equip...

An API is also an important requirement for adoption. That is the real reason cuDNN has been so successful, it defined a low level C API for deep learning primitives,...

A standard API for deep learning primitives would also mean that frameworks would be able to support any GPU or hardware platform that implements the API. The fact that none...

@hughperkins as @cliffwoolley suggested, call `cudnnFindConvolutionForwardAlgorithm()` to find the fastest cuDNN convolution algorithm for a given layer configuration. In my paper I used `cudnnGetConvolutionForwardAlgorithm()` which cannot be relied on to...

Why not use 128-bit texture loads instead? That will extend the reach of the texture indexes, reduce the number of load instructions, and reduce the indexing arithmetic.

Why is the VGG benchmark using a mini-batch size of 32? The paper seems to say the mini-batch size was 256/4GPUs = 64/GPU. http://arxiv.org/abs/1409.1556