flans39 comments

Repositories
Issues
Comments

Results 2 comments of


                                            flans39

Developing CPU INT8 quantization

> Motivation: we use FBGEMM in order to have consistent accuracy as PyTorch dynamic quantization. > As TurboTransformer's optimizations are focused on Non-GEMM operations, we can reuse PyTorch QLinear code...

Developing CPU INT8 quantization

> Can you paste some preliminary benchmarking results on PyTorch dynamic quantization? Here it is. (on a virtual 8-core CPU @2.6 GHz) M, N, K | Threads | torch fp32...