DeepShift Wonder the performance under the CPU with mixed assembly

Wonder the performance under the CPU with mixed assembly

Open eeyrw opened this issue 5 years ago • 1 comments

Is it possible to applied mixed assembly techniques to DeepShift to achieve great enhancement (>=5x) on inference speed?

Feb 27 '20 04:02 eeyrw

I believe to obtain a great enhancement like >=5x we need hardware that deals with variables of 4bits (to represent the weights), instead of dealing with variables of 8bits and having to do masking and bit extraction. Perhaps some embedded AI chips or embedded GPUs may have that.

Having said, there's still lots of room to speed up the CPU and CUDA kernels we have developed: fusing convolution with elementwise operations, use JIT compilers instead of 0re-built binaries, and tuning tiling size.

For CPU, we can look for instruction set architectures that have vector (i.e., SIMD) instructions for bitwise shifts and sign flips.

Mar 03 '20 13:03 mostafaelhoushi

DeepShift DeepShift copied to clipboard

Wonder the performance under the CPU with mixed assembly

DeepShift
DeepShift copied to clipboard