Andrew Lavin comments

Results 64 comments of


                                            Andrew Lavin

[August 2014] Discussion of results

@nouiz L1 is clearly not a middle layer, so the only necessary change is to remove the gradInput calculation from the L1 results.

Nervana's Neon and Winograd

@soumith Why have the cuDNN R4 Googlenet-1 numbers changed?

Nervana's Neon and Winograd

OK, thanks, I wasn't expecting the Neon Googlenet-1 speedup to decrease. ;-) So these numbers make more sense.

Nervana's Neon and Winograd

That's great, Julien. Can't wait to see Winograd/Cook/Toom in cuDNN. :-) It has been almost a year since I discovered this new approach to convnet acceleration, and it is great...

Nervana's Neon and Winograd

Now if you guys could put your heads together and figure out a way to end the NCHW vs CHWN vs NHWC wars. There must be some way to equip...

Nervana's Neon and Winograd

An API is also an important requirement for adoption. That is the real reason cuDNN has been so successful, it defined a low level C API for deep learning primitives,...

Nervana's Neon and Winograd

A standard API for deep learning primitives would also mean that frameworks would be able to support any GPU or hardware platform that implements the API. The fact that none...

[August 2015] Rejigging the marks...

@hughperkins as @cliffwoolley suggested, call `cudnnFindConvolutionForwardAlgorithm()` to find the fastest cuDNN convolution algorithm for a given layer configuration. In my paper I used `cudnnGetConvolutionForwardAlgorithm()` which cannot be relied on to...

[December 2014] benchmarking Imagenet winners

Why not use 128-bit texture loads instead? That will extend the reach of the texture indexes, reduce the number of load instructions, and reduce the indexing arithmetic.

[December 2014] benchmarking Imagenet winners

Why is the VGG benchmark using a mini-batch size of 32? The paper seems to say the mini-batch size was 256/4GPUs = 64/GPU. http://arxiv.org/abs/1409.1556