Andrew Lavin

Results 61 comments of Andrew Lavin

Mini-batch size can have a drastic affect on performance, so changing it for a benchmark is significant. Would also be useful to know which libraries failed to run full VGG....

I should note the theoretical speedup of the 16x16 tile FFT is much better with 5x5 kernels, but still the Winograd algorithm might be advantageous in situations where the smaller...

> call the command again to disable these regions. That is undocumented behavior.

The main concern is numeric accuracy, for this simple rational numbers seem to work best. Any selection of unique roots will give the minimal number of real multiplications. On Tue,...

Hi @promach , I wrote the "Fast algorithms ..." paper before I wrote winCNN. The paper uses a more general technique that can compute a much larger family of fast...

Because F(4,5) is the same degree as F(6,3), the same 7 interpolation points should work for both algorithms. Generally for any F(m,r) with m+r-2 = n, you can try the...

Yes, the order you list the points in does not matter .. and otherwise your list is the same as mine. The usual caveat applies: the more interpolation points your...

Yes, F(2,3) should work well with float16 precision, but as always, actually measure the error compared to a high precision direct convolution for your application!

There are a couple of ways to think about this. I will assume you are using 8-bit integers and not 8-bit floating point numbers. For deployment, the network weights are...

Another thing to try with int8 winograd is to quantize each of the winograd components separately. This might be especially helpful when the input to the convolutional layer is the...