neural-compressor icon indicating copy to clipboard operation
neural-compressor copied to clipboard

how to extract int8 weights from quantized model

Open chensterliu opened this issue 1 year ago • 8 comments

when loading the quantized model (smoothquant) with

from neural_compressor.utils.pytorch import load
qmodel = load(qmodel_path, model_fp)

I got RecursiveScriptModule(original_name=QuantizationDispatchModule) I'd like to extract those quantized int8 weight matrix, together with corresponding quantization parameter (scales, zero_points), what should I do?

chensterliu avatar May 25 '24 15:05 chensterliu

Hi @chensterliu , can you provide more details on the model that you quantized, the strategy and the version of neural_compressor and intel_extension_for_pytorch.

srinarayan-srikanthan avatar May 28 '24 14:05 srinarayan-srikanthan

Hello, I used

neural_compressor             2.5.1
intel-extension-for-pytorch   2.3.0 

for the smoothquant. What I've done is just running the script neural compressor/examples/pytorch/nlp/huggingface_models/language-modeling/quantization/llm/run_clm_no_trainer.py with arguments as follows:

    python -u run_clm_no_trainer.py \
        --model "facebook/opt-125m \
        --dataset "lambada" \
        --approach "static" \
        --output_dir "quan_out" \
        --quantize \
        --batch_size 16 \
         --ipex --int8_bf16_mixed --sq --alpha 0.5

I got the quan_out dir with 2 files inside: best_configure.json and best_model.pt successfully.

My question is how to get quantized int8 weight matrix from those files? The method in my first post doesn't work as the loaded qmodel is a RecursiveScriptModule. It seems a compiled product that can do inference but weights can't be retrieved as state_dict(). I appreciate if you could offer any method to obtain those quantized integers similar to named_parameters() of a normal toch.nn model.

chensterliu avatar May 29 '24 09:05 chensterliu

Hi @chensterliu , I am able to run the command that you used to quantize and I am able to load the model using from neural_compressor.utils.pytorch import load qmodel = load("./saved_results")

The command i used to quantize: python run_clm_no_trainer.py --dataset "lambada" --model facebook/opt-125m --quantize --batch_size 16 --sq --alpha 0.5 --ipex --output_dir "./saved_results" --int8_bf16_mixed

If you are still facing issues can you directly try to load the model using this : https://github.com/intel/neural-compressor/blob/29fdecbbb44ceb8d19c12809af90dc23063becfc/neural_compressor/utils/pytorch.py#L274C1-L281C57

srinarayan-srikanthan avatar Jun 05 '24 22:06 srinarayan-srikanthan

Hi @srinarayan-srikanthan , loading the qmodel is fine. My problem is that the loaded qmodel doesn't bring any weights information to me. Please see the attached figure, do you also have this RecursiveScriptModule? How do you get int8 weights from the qmodel? Screenshot_2024-06-06_15-07-34

chensterliu avatar Jun 06 '24 13:06 chensterliu

The torch.jit model is well packed for inference so you cannot unpack it and see its weight.

srinarayan-srikanthan avatar Jun 25 '24 05:06 srinarayan-srikanthan

My goal is to extract those quantized int8 weights. Do you have workaround to achieve this? Or it is technically not possible.

chensterliu avatar Jun 25 '24 08:06 chensterliu

Yes, can you try this workaround:

# Function to extract constants
def extract_constants(frozen_model):
    constants = {}
    for node in frozen_model.graph.nodes():
           if node.output().type().isSubtypeOf(torch._C.TensorType.get()):
                constant_name = node.output().debugName()
                constant_value = node.output().toIValue()
                constants[constant_name] = constant_value
    return constants

# Extract and print constants
constants = extract_constants(a) #your model
print("Freezed Model Constants:")
for name, value in constants.items():
    print(f"{name}: {value}")

srinarayan-srikanthan avatar Jun 26 '24 02:06 srinarayan-srikanthan

thank you. The code works. The only subtile thing is that the printed names are index. which is difficult to trace back which tensor for which layer.

chensterliu avatar Jun 27 '24 15:06 chensterliu

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar Nov 06 '25 22:11 github-actions[bot]

This issue was closed because it has been stalled for 7 days with no activity.

github-actions[bot] avatar Nov 13 '25 22:11 github-actions[bot]