Maratyszcza

Results 229 comments of Maratyszcza

@ngaloppo The timings are per batch. The parameters of the networks are from `cpu` branch of `convnet-benchmarks`. Please see #2 for Maratyszcza/NNPACK#9 for details. Backward pass is not supported and...

I changed the defaults in Caffe.proto and recompiled Caffe for each algorithm

@ngaloppo Do you use prototxt from `convnet-benchmarks`? Specifications from other sources (e.g. Caffe model zoo) may have different image sizes or numbers of channers in hidden layers.

@anijain2305 NNPACK would use `OMP_NUM_THREADS`, if the variable is set, or all virtual threads if it is not specified.

@wangxi123 If you want to reproduce results from README, **don't** use `--enable-psimd` options

@wangxi123 When you add `engine: NNPACK`, Caffe would use NNPACK implementation. If NNPACK is configured **with** `--enable-psimd`, it would be a generic small-SIMD implementation using SSE2. If you configure NNPACK...

@wangxi123 `WINOGRAD` algorithm is implemented only for 3x3 kernels. `AUTO` will choose an algorithm automatically, among FFT, Winograd transform, and implicit GEMM.

@wangxi123 In the current implementation of most convolution functions in NNPACK you need quite large batch size to get speedup (at least 128, better 256). No that it doesn't affect...

This test checks all `2**32` possible input values, so most likely you just need to wait for longer.

Thank you for the link. We'll mention it in later revisions of the paper, but the operation is not quite the same as FPADDRE. I don't fully understand the "young...