distiller icon indicating copy to clipboard operation
distiller copied to clipboard

Enable resuming of Quantization Aware Training checkpoints

Open levzlotnik opened this issue 5 years ago • 4 comments

Added a fix for the quant-aware-train checkpoint resuming issue #185

levzlotnik avatar Jul 07 '19 16:07 levzlotnik

Hi @guyjacob ,

A quick summary of my changes:

  1. I have modified the flow of compress_classifier.py to first load the compression_scheduler and then the checkpoint (if there's any).
  2. After that, I've modified QuantizationPolicy to transform the model during the on_epoch_begin callback instead of during the ctor. That raised issues with the flow because on some QAT an new parameter is born in the model (module.clip_val) but it wasn't moved to the correct device, so I move the entire model to the correct device during the on_epoch_begin callback.
  3. Added model.is_quantized to indicate whether a transformation is needed.
  4. Added dummy_input into model.quantizer_metadata to allow easy access when loading.

Combined these and this enabled loading QAT with --resume-from :) Let me know what you think please.

levzlotnik avatar Jul 26 '19 10:07 levzlotnik

So are there any follow-up on merging this pr?

Thank you Yun-Chen Lo

yunchenlo avatar Oct 28 '19 09:10 yunchenlo

The flow is now working for Quantizers that don't involve adding new param groups to the optimizer. That is - it doesn't work for quantizers that implement _get_new_optimizer_params_groups() (e.g. PACTQuantizer). Those are still WIP.

guyjacob avatar Dec 03 '19 16:12 guyjacob

Hi, how do I evaluate the model after quantizing using quantization aware training? I still have similar issue:

`Traceback (most recent call last): File "compress_classifier.py", line 212, in main() File "compress_classifier.py", line 74, in main app = ClassifierCompressorSampleApp(args, script_dir=os.path.dirname(file)) File "compress_classifier.py", line 164, in init super().init(args, script_dir) File "/home/th.nguyen/PycharmProjects/SAsimulate_quantize/distiller/distiller/apputils/image_classifier.py", line 71, in init self.start_epoch, self.ending_epoch) = _init_learner(self.args) File "/home/th.nguyen/PycharmProjects/SAsimulate_quantize/distiller/distiller/apputils/image_classifier.py", line 401, in _init_learner model = apputils.load_lean_checkpoint(model, args.load_model_path, model_device=args.device) File "/home/th.nguyen/PycharmProjects/SAsimulate_quantize/distiller/distiller/apputils/checkpoint.py", line 92, in load_lean_checkpoint lean_checkpoint=True)[0] File "/home/th.nguyen/PycharmProjects/SAsimulate_quantize/distiller/distiller/apputils/checkpoint.py", line 224, in load_checkpoint quantizer = qmd['type'](model, **qmd['params'])

TypeError: init() missing 1 required positional argument: 'optimizer'`

My command line: python3.6 compress_classifier.py --arch preact_resnet20_cifar ../../Datasets/cifar10/ --evaluate --resume-from logs/2020.08.13-151953/checkpoint.pth.tar --gpus 1 --compress=../quantization/quant_aware_train/preact_resnet20_cifar_pact.yalm

thnguyen996 avatar Aug 14 '20 04:08 thnguyen996