flans39

Results 2 comments of flans39

> Motivation: we use FBGEMM in order to have consistent accuracy as PyTorch dynamic quantization. > As TurboTransformer's optimizations are focused on Non-GEMM operations, we can reuse PyTorch QLinear code...

> Can you paste some preliminary benchmarking results on PyTorch dynamic quantization? Here it is. (on a virtual 8-core CPU @2.6 GHz) M, N, K | Threads | torch fp32...