Tensorization for avx2

Open kimishpatel opened this issue 6 years ago • 4 comments

Summary: Similar to other avx512 tensorization that reduces data:1x4 and kernel:16x4 to output:1x16, this PR introduces similar reduction using avx2 tensorization. It keeps the same API as avx512 so as to not have to introduce a new memory layout for weights.

Test Plan: on avx2 machine: python tests/python/contrib/test_gemm_avx2_acc32.py

Sep 20 '19 00:09 kimishpatel

NVM, just saw you other PR. :)

Sep 20 '19 03:09 yinghai

Aaah this one is messed up. My base branch was tensorize_fix. So it shows changes from that. Let me fix this.

Sep 20 '19 14:09 kimishpatel

Depends on this PR: https://github.com/facebookexperimental/tvm/pull/7

Sep 20 '19 14:09 kimishpatel

Benchmark number: Tensorization: running time: 25.363 ms, 84.67 Gops/s For m, n and k = 1024

Sep 20 '19 14:09 kimishpatel