djl icon indicating copy to clipboard operation
djl copied to clipboard

support dynamic quantization of pytorch

Open zaobao opened this issue 10 months ago • 0 comments

Description

dynamic quantization (weights quantized with activations read/stored in floating point and quantized for compute)

Will this change the current api?

  • Model.quantize

Who will benefit from this enhancement? Today, PyTorch supports the following backends for running quantized operators efficiently:

  • x86 CPUs with AVX2 support or higher (without AVX2 some operations have inefficient implementations), via x86 optimized by fbgemm and onednn (see the details at RFC)

  • ARM CPUs (typically found in mobile/embedded devices), via qnnpack

  • (early prototype) support for NVidia GPU via TensorRT through fx2trt (to be open sourced)

References

zaobao avatar Apr 02 '24 13:04 zaobao