neural-compressor
neural-compressor copied to clipboard
Continue quantization from history.snapshot
I was wondering if there is a way to resume qunatization from history.snapshot?
I am using onnx and onnxrt_cuda_ep.
I am can qunatize the model but before saving the model, the code crashes (not related to inc); is there away to continue from history.snapshot instead of running the code from the beginning.
Applying AWQ clip Progress: [####################] 100.00%2024-05-07 14:56:05 [INFO] |Mixed Precision Statistics| 2024-05-07 14:56:05 [INFO] +------------+---------+---------------+ 2024-05-07 14:56:05 [INFO] | Op Type | Total | A32W4G32 | 2024-05-07 14:56:05 [INFO] +------------+---------+---------------+ 2024-05-07 14:56:05 [INFO] | MatMul | 193 | 193 | 2024-05-07 14:56:05 [INFO] +------------+---------+---------------+ 2024-05-07 14:56:05 [INFO] Pass quantize model elapsed time: 6294630.87 ms 2024-05-07 14:56:05 [INFO] Save tuning history to C:\llm\quantization\nc_workspace\2024-05-07_13-10-57./history.snapshot. 2024-05-07 14:56:05 [INFO] [Strategy] Found the model meets accuracy requirements, ending the tuning process. 2024-05-07 14:56:05 [INFO] Specified timeout or max trials is reached! Found a quantized model which meet accuracy goal. Exit. 2024-05-07 14:56:05 [INFO] Save deploy yaml to C:\llm\quantization\nc_workspace\2024-05-07_13-10-57\deploy.yaml
Hi @oyazdanb,
Welcome to neural-compressor~
Yes, there is some function to resume qunatization from history.snapshot.
I'll check the function and feedback to you ASAP.
@oyazdanb the recover is borken for some models (not for all). Development team is working to fix it.
During the time, I show you the way to recover from history.snapshot, you can try your model to check if it works for your model.
If it does not work, you can: 1). wait for some days. I'll notify you after it is being fixed.
- install neural-compresson 2.0 and recover with 2.0. We do not recommed to roll back to earlier version though.
Here is the way you can try to recover. Not sure it works for you model now.
from neural_compressor.utils.utility import recover
recover_qmodel = recover( fp32_onnx_model, "./nc_workspace/2024-05-10_19-16-32/history.snapshot", 0)
Here is the define of recover
365 def recover(fp32_model, tuning_history_path, num, **kwargs):
366 """Get offline recover tuned model.
367
368 Args:
369 fp32_model: Input model path
370 tuning_history_path: The tuning history path, which needs user to assign
371 num: tune index
372 """
Fix borken recover. PR: https://github.com/intel/neural-compressor/pull/1788
Close as issue fixed.