neural-compressor icon indicating copy to clipboard operation
neural-compressor copied to clipboard

Continue quantization from history.snapshot

Open oyazdanb opened this issue 1 year ago • 3 comments

I was wondering if there is a way to resume qunatization from history.snapshot?

I am using onnx and onnxrt_cuda_ep.

I am can qunatize the model but before saving the model, the code crashes (not related to inc); is there away to continue from history.snapshot instead of running the code from the beginning.

Applying AWQ clip Progress: [####################] 100.00%2024-05-07 14:56:05 [INFO] |Mixed Precision Statistics| 2024-05-07 14:56:05 [INFO] +------------+---------+---------------+ 2024-05-07 14:56:05 [INFO] | Op Type | Total | A32W4G32 | 2024-05-07 14:56:05 [INFO] +------------+---------+---------------+ 2024-05-07 14:56:05 [INFO] | MatMul | 193 | 193 | 2024-05-07 14:56:05 [INFO] +------------+---------+---------------+ 2024-05-07 14:56:05 [INFO] Pass quantize model elapsed time: 6294630.87 ms 2024-05-07 14:56:05 [INFO] Save tuning history to C:\llm\quantization\nc_workspace\2024-05-07_13-10-57./history.snapshot. 2024-05-07 14:56:05 [INFO] [Strategy] Found the model meets accuracy requirements, ending the tuning process. 2024-05-07 14:56:05 [INFO] Specified timeout or max trials is reached! Found a quantized model which meet accuracy goal. Exit. 2024-05-07 14:56:05 [INFO] Save deploy yaml to C:\llm\quantization\nc_workspace\2024-05-07_13-10-57\deploy.yaml

oyazdanb avatar May 08 '24 14:05 oyazdanb

Hi @oyazdanb,

Welcome to neural-compressor~

Yes, there is some function to resume qunatization from history.snapshot.

I'll check the function and feedback to you ASAP.

xiguiw avatar May 09 '24 09:05 xiguiw

@oyazdanb the recover is borken for some models (not for all). Development team is working to fix it.

During the time, I show you the way to recover from history.snapshot, you can try your model to check if it works for your model.

If it does not work, you can: 1). wait for some days. I'll notify you after it is being fixed.

  1. install neural-compresson 2.0 and recover with 2.0. We do not recommed to roll back to earlier version though.

Here is the way you can try to recover. Not sure it works for you model now.

     from neural_compressor.utils.utility import recover
     recover_qmodel = recover( fp32_onnx_model, "./nc_workspace/2024-05-10_19-16-32/history.snapshot", 0)

Here is the define of recover

 365 def recover(fp32_model, tuning_history_path, num, **kwargs):
 366     """Get offline recover tuned model.
 367
 368     Args:
 369         fp32_model: Input model path
 370         tuning_history_path: The tuning history path, which needs user to assign
 371         num: tune index
 372     """

xiguiw avatar May 10 '24 14:05 xiguiw

Fix borken recover. PR: https://github.com/intel/neural-compressor/pull/1788

xiguiw avatar May 11 '24 10:05 xiguiw

Close as issue fixed.

xiguiw avatar Jul 20 '24 02:07 xiguiw