neural-compressor
neural-compressor copied to clipboard
how to extract int8 weights from quantized model
when loading the quantized model (smoothquant) with
from neural_compressor.utils.pytorch import load
qmodel = load(qmodel_path, model_fp)
I got
RecursiveScriptModule(original_name=QuantizationDispatchModule)
I'd like to extract those quantized int8 weight matrix, together with corresponding quantization parameter (scales, zero_points), what should I do?
Hi @chensterliu , can you provide more details on the model that you quantized, the strategy and the version of neural_compressor and intel_extension_for_pytorch.
Hello, I used
neural_compressor 2.5.1
intel-extension-for-pytorch 2.3.0
for the smoothquant. What I've done is just running the script
neural compressor/examples/pytorch/nlp/huggingface_models/language-modeling/quantization/llm/run_clm_no_trainer.py with arguments as follows:
python -u run_clm_no_trainer.py \
--model "facebook/opt-125m \
--dataset "lambada" \
--approach "static" \
--output_dir "quan_out" \
--quantize \
--batch_size 16 \
--ipex --int8_bf16_mixed --sq --alpha 0.5
I got the quan_out dir with 2 files inside: best_configure.json and best_model.pt successfully.
My question is how to get quantized int8 weight matrix from those files? The method in my first post doesn't work as the loaded qmodel is a RecursiveScriptModule. It seems a compiled product that can do inference but weights can't be retrieved as state_dict(). I appreciate if you could offer any method to obtain those quantized integers similar to named_parameters() of a normal toch.nn model.
Hi @chensterliu , I am able to run the command that you used to quantize and I am able to load the model using from neural_compressor.utils.pytorch import load qmodel = load("./saved_results")
The command i used to quantize: python run_clm_no_trainer.py --dataset "lambada" --model facebook/opt-125m --quantize --batch_size 16 --sq --alpha 0.5 --ipex --output_dir "./saved_results" --int8_bf16_mixed
If you are still facing issues can you directly try to load the model using this : https://github.com/intel/neural-compressor/blob/29fdecbbb44ceb8d19c12809af90dc23063becfc/neural_compressor/utils/pytorch.py#L274C1-L281C57
Hi @srinarayan-srikanthan , loading the qmodel is fine. My problem is that the loaded qmodel doesn't bring any weights information to me. Please see the attached figure, do you also have this RecursiveScriptModule? How do you get int8 weights from the qmodel?
The torch.jit model is well packed for inference so you cannot unpack it and see its weight.
My goal is to extract those quantized int8 weights. Do you have workaround to achieve this? Or it is technically not possible.
Yes, can you try this workaround:
# Function to extract constants
def extract_constants(frozen_model):
constants = {}
for node in frozen_model.graph.nodes():
if node.output().type().isSubtypeOf(torch._C.TensorType.get()):
constant_name = node.output().debugName()
constant_value = node.output().toIValue()
constants[constant_name] = constant_value
return constants
# Extract and print constants
constants = extract_constants(a) #your model
print("Freezed Model Constants:")
for name, value in constants.items():
print(f"{name}: {value}")
thank you. The code works. The only subtile thing is that the printed names are index. which is difficult to trace back which tensor for which layer.
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue was closed because it has been stalled for 7 days with no activity.