neural-compressor how to extract int8 weights from quantized model

when loading the quantized model (smoothquant) with

from neural_compressor.utils.pytorch import load
qmodel = load(qmodel_path, model_fp)

I got RecursiveScriptModule(original_name=QuantizationDispatchModule) I'd like to extract those quantized int8 weight matrix, together with corresponding quantization parameter (scales, zero_points), what should I do?

May 25 '24 15:05 chensterliu

Hi @chensterliu , can you provide more details on the model that you quantized, the strategy and the version of neural_compressor and intel_extension_for_pytorch.

May 28 '24 14:05 srinarayan-srikanthan

Hello, I used

neural_compressor             2.5.1
intel-extension-for-pytorch   2.3.0

for the smoothquant. What I've done is just running the script neural compressor/examples/pytorch/nlp/huggingface_models/language-modeling/quantization/llm/run_clm_no_trainer.py with arguments as follows:

    python -u run_clm_no_trainer.py \
        --model "facebook/opt-125m \
        --dataset "lambada" \
        --approach "static" \
        --output_dir "quan_out" \
        --quantize \
        --batch_size 16 \
         --ipex --int8_bf16_mixed --sq --alpha 0.5

I got the quan_out dir with 2 files inside: best_configure.json and best_model.pt successfully.

My question is how to get quantized int8 weight matrix from those files? The method in my first post doesn't work as the loaded qmodel is a RecursiveScriptModule. It seems a compiled product that can do inference but weights can't be retrieved as state_dict(). I appreciate if you could offer any method to obtain those quantized integers similar to named_parameters() of a normal toch.nn model.

May 29 '24 09:05 chensterliu

Hi @chensterliu , I am able to run the command that you used to quantize and I am able to load the model using from neural_compressor.utils.pytorch import load qmodel = load("./saved_results")

The command i used to quantize: python run_clm_no_trainer.py --dataset "lambada" --model facebook/opt-125m --quantize --batch_size 16 --sq --alpha 0.5 --ipex --output_dir "./saved_results" --int8_bf16_mixed

If you are still facing issues can you directly try to load the model using this : https://github.com/intel/neural-compressor/blob/29fdecbbb44ceb8d19c12809af90dc23063becfc/neural_compressor/utils/pytorch.py#L274C1-L281C57

Jun 05 '24 22:06 srinarayan-srikanthan

Hi @srinarayan-srikanthan , loading the qmodel is fine. My problem is that the loaded qmodel doesn't bring any weights information to me. Please see the attached figure, do you also have this RecursiveScriptModule? How do you get int8 weights from the qmodel? Screenshot_2024-06-06_15-07-34

Jun 06 '24 13:06 chensterliu

The torch.jit model is well packed for inference so you cannot unpack it and see its weight.

Jun 25 '24 05:06 srinarayan-srikanthan

My goal is to extract those quantized int8 weights. Do you have workaround to achieve this? Or it is technically not possible.

Jun 25 '24 08:06 chensterliu

Yes, can you try this workaround:

# Function to extract constants
def extract_constants(frozen_model):
    constants = {}
    for node in frozen_model.graph.nodes():
           if node.output().type().isSubtypeOf(torch._C.TensorType.get()):
                constant_name = node.output().debugName()
                constant_value = node.output().toIValue()
                constants[constant_name] = constant_value
    return constants

# Extract and print constants
constants = extract_constants(a) #your model
print("Freezed Model Constants:")
for name, value in constants.items():
    print(f"{name}: {value}")

Jun 26 '24 02:06 srinarayan-srikanthan

thank you. The code works. The only subtile thing is that the printed names are index. which is difficult to trace back which tensor for which layer.

Jun 27 '24 15:06 chensterliu

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 7 days.

Nov 06 '25 22:11 github-actions[bot]

This issue was closed because it has been stalled for 7 days with no activity.

Nov 13 '25 22:11 github-actions[bot]

neural-compressor neural-compressor copied to clipboard

how to extract int8 weights from quantized model

neural-compressor
neural-compressor copied to clipboard