neural-compressor icon indicating copy to clipboard operation
neural-compressor copied to clipboard

Unable to save model in saved_results directory

Open emanbaeman opened this issue 1 year ago • 3 comments

Hello,

I have been attempting to quantize the t5-small model using the t5-small topology. Despite making changes to hyperparameters such as tune = True and save_strategy="epoch". I have already created a conda environment and set do_eval=False just to see if it makes a difference. I am still unable to save the model results in the saved_results directory. After the run_quant.sh script finishes executing, it gives me an error indicating that nothing has been saved. Please assist.

emanbaeman avatar Oct 06 '23 12:10 emanbaeman

Hi @emanbaeman,

Thanks for reporting the issue. I tried to reproduce it in my environment (Intel(R) Xeon(R) Gold 6336Y, Linux), the model can be saved correctly after tuning complete, I can find saved model file "best_model.pt" under saved_results directory.

As you mentioned there is an error indicating that nothing has been saved after you executed run_quant.sh script, would you please help to post logs? Also it will be helpful to provide your environment info. e.g. CPU & OS

thuang6 avatar Oct 07 '23 05:10 thuang6

Hi, I managed to save model in saved_results directory. However, i am unable to load the model using

from neural_compressor.utils.pytorch import load
q_model = load('neural-compressor/examples/pytorch/nlp/huggingface_models/translation/quantization/ptq_dynamic/fx/saved_results', 'neural-compressor/examples/pytorch/nlp/huggingface_models/translation/quantization/ptq_dynamic/fx/saved_results/best_model.pt')

Even tried loading the model using model = AutoModelForSeq2SeqLM.from_pretrained('neural-compressor/examples/pytorch/nlp/huggingface_models/translation/quantization/ptq_dynamic/fx/saved_results/' ) but it is asking for config.json. Would be helpful if you can guide further.

emanbaeman avatar Oct 09 '23 12:10 emanbaeman

You can refer to following sample code for model loading

import os
from transformers import (
    AutoTokenizer,
    AutoModelForSeq2SeqLM,
)

tokenizer = AutoTokenizer.from_pretrained('t5-small')
model = AutoModelForSeq2SeqLM.from_pretrained('t5-small')

model.resize_token_embeddings(len(tokenizer))

from neural_compressor.utils.pytorch import load 
q_model = load(os.path.abspath(os.path.expanduser('saved_results')), model)
print(q_model)

result log is attached below

$ python load.py /home/thuang6/.local/lib/python3.10/site-packages/torch/_utils.py:335: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() device=storage.device, T5ForConditionalGeneration( (shared): GraphModule( (module): QuantizedEmbedding(num_embeddings=32100, embedding_dim=512, dtype=torch.quint8, qscheme=torch.per_channel_affine_float_qparams) ) (encoder): T5Stack( (embed_tokens): Embedding(32100, 512) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): GraphModule( (module): DynamicQuantizedLinear(in_features=512, out_features=512, dtype=torch.qint8, qscheme=torch.per_channel_affine) ) (k): GraphModule( (module): DynamicQuantizedLinear(in_features=512, out_features=512, dtype=torch.qint8, qscheme=torch.per_channel_affine) ) ...

thuang6 avatar Oct 10 '23 06:10 thuang6

we haven't heard back for a while, let's close it for now. Feel free to reopen if you need more help. Thank you!

thuang6 avatar Apr 26 '24 03:04 thuang6