instructor-embedding icon indicating copy to clipboard operation
instructor-embedding copied to clipboard

Save and load dynamically quantized model

Open roman-dobrov opened this issue 2 years ago • 4 comments

Hello! First of all, great work on instructor.

I'd like to load a quantized model to avoid CPU/memory spikes on my script startup which happen during quantization itself.

I tried static quantization first but it is not supported for SentenceTransformers for float16 or qint8. For dynamic quantization I get the following errors when trying to load a saved state_dict:

RuntimeError: Error(s) in loading state_dict for INSTRUCTOR:
        Unexpected key(s) in state_dict: "2.linear.scale", "2.linear.zero_point", "2.linear._packed_params.dtype", "2.linear._packed_params._packed_params".

I tried two save methods: direct torch.save(model.state_dict()) and saving traced version with torch.jit.trace but both result in the same error. So, is there a way to save/load a quantized model?

roman-dobrov avatar Nov 16 '23 13:11 roman-dobrov

Hi, Thanks a lot for your interest in the INSTRUCTOR model!

The following works for me:

import torch
from InstructorEmbedding import INSTRUCTOR
from torch.nn import Embedding, Linear
from torch.quantization import quantize_dynamic

model = INSTRUCTOR('hkunlp/instructor-large',device='cpu')
qconfig_dict = {Embedding : torch.ao.quantization.qconfig.float_qparams_weight_only_qconfig, Linear: torch.ao.quantization.qconfig.default_dynamic_qconfig}

qmodel = quantize_dynamic(model, qconfig_dict)
torch.save(qmodel.state_dict(),'state.pt')

Hope this helps!

hongjin-su avatar Dec 19 '23 09:12 hongjin-su

@hongjin-su Thank you for your response! Does loading of quantized model work for you?

roman-dobrov avatar Dec 19 '23 09:12 roman-dobrov

Yeah, this seems to work:

>>> import torch
>>> a = torch.load('state.pt')
/home/linuxbrew/.linuxbrew/Cellar/[email protected]/3.11.6/lib/python3.11/site-packages/torch/_utils.py:376: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  device=storage.device,

hongjin-su avatar Dec 19 '23 09:12 hongjin-su

@hongjin-su And how do you convert it to the actual model? torch.load returns OrderedDict which is a state dict. I get the aforementioned error on trying to load_state_dict before actually using the model

roman-dobrov avatar Dec 19 '23 14:12 roman-dobrov