AQLM Support for Computer Vision models

I'm curious to know how these techniques can be applied to Computer Vision models that need to be deployed on OpenVINO and TensorRT. I suspect that it will take a long to get a similar support on OpenVINO and TensorRT framework. However, I'm not sure about PyTorch.

Can you please shed some light on this?

Aug 30 '24 05:08 harshdhamecha

Hi, @harshdhamecha. Thanks for your interest in the project.

Could you elaborate more on a specific use case? AQLM method is model-agnostic and may be readily applied to common CV architectures (CNN and DiTs). However the practical benefit of this heavily depends on your model size. The main value proposition of AQLM is the reduction of model size, which for sufficiently large models and memory-bound inference may offer speed-ups. In practice, the benefits become pronounced for billion-parameter models. If your model is not large (10M-100M) parameters, additional overheads induced by dequantization are likely to make operation too slow.

Concerning the integration with TensorRT/OpenVINO, at the moment we do not have plans about intergration with these frameworks.

Could you, please, specify what to you mean by PyTorch support - vanilla torch implementation or something like TorchServe?

Sep 04 '24 13:09 Godofnothing

This issue is stale because it has been open for 30 days with no activity.

Oct 05 '24 01:10 github-actions[bot]

This issue was closed because it has been inactive for 14 days since being marked as stale.

Oct 19 '24 01:10 github-actions[bot]