djl
djl copied to clipboard

Published 20 hours ago •

deepjavalibrary

Reame
Issues

support dynamic quantization of pytorch

Open zaobao opened this issue 10 months ago • 0 comments

Description

dynamic quantization (weights quantized with activations read/stored in floating point and quantized for compute)

Will this change the current api?

Model.quantize

Who will benefit from this enhancement? Today, PyTorch supports the following backends for running quantized operators efficiently:

x86 CPUs with AVX2 support or higher (without AVX2 some operations have inefficient implementations), via x86 optimized by fbgemm and onednn (see the details at RFC)
ARM CPUs (typically found in mobile/embedded devices), via qnnpack
(early prototype) support for NVidia GPU via TensorRT through fx2trt (to be open sourced)

References

Apr 02 '24 13:04 zaobao