neural-compressor
neural-compressor copied to clipboard
Unable to save model in saved_results directory
Hello,
I have been attempting to quantize the t5-small model using the t5-small topology. Despite making changes to hyperparameters such as tune = True
and save_strategy="epoch"
. I have already created a conda environment and set do_eval=False
just to see if it makes a difference. I am still unable to save the model results in the saved_results directory. After the run_quant.sh script finishes executing, it gives me an error indicating that nothing has been saved. Please assist.
Hi @emanbaeman,
Thanks for reporting the issue. I tried to reproduce it in my environment (Intel(R) Xeon(R) Gold 6336Y, Linux), the model can be saved correctly after tuning complete, I can find saved model file "best_model.pt" under saved_results directory.
As you mentioned there is an error indicating that nothing has been saved after you executed run_quant.sh script, would you please help to post logs? Also it will be helpful to provide your environment info. e.g. CPU & OS
Hi, I managed to save model in saved_results directory. However, i am unable to load the model using
from neural_compressor.utils.pytorch import load
q_model = load('neural-compressor/examples/pytorch/nlp/huggingface_models/translation/quantization/ptq_dynamic/fx/saved_results', 'neural-compressor/examples/pytorch/nlp/huggingface_models/translation/quantization/ptq_dynamic/fx/saved_results/best_model.pt')
Even tried loading the model using model = AutoModelForSeq2SeqLM.from_pretrained('neural-compressor/examples/pytorch/nlp/huggingface_models/translation/quantization/ptq_dynamic/fx/saved_results/' )
but it is asking for config.json. Would be helpful if you can guide further.
You can refer to following sample code for model loading
import os
from transformers import (
AutoTokenizer,
AutoModelForSeq2SeqLM,
)
tokenizer = AutoTokenizer.from_pretrained('t5-small')
model = AutoModelForSeq2SeqLM.from_pretrained('t5-small')
model.resize_token_embeddings(len(tokenizer))
from neural_compressor.utils.pytorch import load
q_model = load(os.path.abspath(os.path.expanduser('saved_results')), model)
print(q_model)
result log is attached below
$ python load.py /home/thuang6/.local/lib/python3.10/site-packages/torch/_utils.py:335: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() device=storage.device, T5ForConditionalGeneration( (shared): GraphModule( (module): QuantizedEmbedding(num_embeddings=32100, embedding_dim=512, dtype=torch.quint8, qscheme=torch.per_channel_affine_float_qparams) ) (encoder): T5Stack( (embed_tokens): Embedding(32100, 512) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): GraphModule( (module): DynamicQuantizedLinear(in_features=512, out_features=512, dtype=torch.qint8, qscheme=torch.per_channel_affine) ) (k): GraphModule( (module): DynamicQuantizedLinear(in_features=512, out_features=512, dtype=torch.qint8, qscheme=torch.per_channel_affine) ) ...
we haven't heard back for a while, let's close it for now. Feel free to reopen if you need more help. Thank you!