neural-compressor Continue quantization from history.snapshot

I was wondering if there is a way to resume qunatization from history.snapshot?

I am using onnx and onnxrt_cuda_ep.

I am can qunatize the model but before saving the model, the code crashes (not related to inc); is there away to continue from history.snapshot instead of running the code from the beginning.

Applying AWQ clip Progress: [####################] 100.00%2024-05-07 14:56:05 [INFO] |Mixed Precision Statistics| 2024-05-07 14:56:05 [INFO] +------------+---------+---------------+ 2024-05-07 14:56:05 [INFO] | Op Type | Total | A32W4G32 | 2024-05-07 14:56:05 [INFO] +------------+---------+---------------+ 2024-05-07 14:56:05 [INFO] | MatMul | 193 | 193 | 2024-05-07 14:56:05 [INFO] +------------+---------+---------------+ 2024-05-07 14:56:05 [INFO] Pass quantize model elapsed time: 6294630.87 ms 2024-05-07 14:56:05 [INFO] Save tuning history to C:\llm\quantization\nc_workspace\2024-05-07_13-10-57./history.snapshot. 2024-05-07 14:56:05 [INFO] [Strategy] Found the model meets accuracy requirements, ending the tuning process. 2024-05-07 14:56:05 [INFO] Specified timeout or max trials is reached! Found a quantized model which meet accuracy goal. Exit. 2024-05-07 14:56:05 [INFO] Save deploy yaml to C:\llm\quantization\nc_workspace\2024-05-07_13-10-57\deploy.yaml

May 08 '24 14:05 oyazdanb

Hi @oyazdanb,

Welcome to neural-compressor~

Yes, there is some function to resume qunatization from history.snapshot.

I'll check the function and feedback to you ASAP.

May 09 '24 09:05 xiguiw

@oyazdanb the recover is borken for some models (not for all). Development team is working to fix it.

During the time, I show you the way to recover from history.snapshot, you can try your model to check if it works for your model.

If it does not work, you can: 1). wait for some days. I'll notify you after it is being fixed.

install neural-compresson 2.0 and recover with 2.0. We do not recommed to roll back to earlier version though.

Here is the way you can try to recover. Not sure it works for you model now.

     from neural_compressor.utils.utility import recover
     recover_qmodel = recover( fp32_onnx_model, "./nc_workspace/2024-05-10_19-16-32/history.snapshot", 0)

Here is the define of recover

 365 def recover(fp32_model, tuning_history_path, num, **kwargs):
 366     """Get offline recover tuned model.
 367
 368     Args:
 369         fp32_model: Input model path
 370         tuning_history_path: The tuning history path, which needs user to assign
 371         num: tune index
 372     """

May 10 '24 14:05 xiguiw

Fix borken recover. PR: https://github.com/intel/neural-compressor/pull/1788

May 11 '24 10:05 xiguiw

Close as issue fixed.

Jul 20 '24 02:07 xiguiw

neural-compressor neural-compressor copied to clipboard

Continue quantization from history.snapshot

neural-compressor
neural-compressor copied to clipboard