nni icon indicating copy to clipboard operation
nni copied to clipboard

Some suggestions for NNI quantizer to improve support for pytorch DataParallel API

Open un-knight opened this issue 3 years ago • 5 comments

Describe the issue: It seems that NNI QAT_Quantizer doesn't support PyTorch DataParallel correctly, there are some suggestions for NNI to support DataParallel API.

1. Assign buffer variables with the in-place operation

https://github.com/microsoft/nni/blob/2ca227f8144f1bc2328a572017904f45afd4812d/nni/algorithms/compression/pytorch/quantization/quantizers.py#L254-L285

According to pytorch DataParallel document:

In each forward, module is replicated on each device, so any updates to the running module in forward will be lost. For example, if module has a counter attribute that is incremented in each forward, it will always stay at the initial value because the update is done on the replicas which are destroyed after forward.

In this case, I suggest that replacing = with torch.Tensor.copy_() for all buffer variables. For example, replacing module.tracked_min_input = torch.min(output) with module.tracked_min_input.copy_(torch.min(output))

2. Register module.weight as real buffer

https://github.com/microsoft/nni/blob/2ca227f8144f1bc2328a572017904f45afd4812d/nni/compression/pytorch/compressor.py#L473

I suggest that replacing self.module.register_buffer('weight', self.module.old_weight) with self.module.register_buffer('weight', self.module.old_weight.data).

Since self.module.old_weight is a Parameter which requires gradient, self.module.register_buffer('weight', self.module.old_weight) will cause that self.weight is still a Parameter which requires gradient.

un-knight avatar May 10 '21 09:05 un-knight

Very meaningful suggestions, thank you! In the following releases, we will refactor current quantizer implementation and your suggestions will be considered seriously. What's more, you can submit PR to improve current design if possible.

linbinskn avatar May 13 '21 11:05 linbinskn

@QuanluZhang we could consider about these meaningful suggestions during refactoring Quantization and Pruning.

J-shang avatar Jul 25 '22 02:07 J-shang

Looking forward to using the new version of NNI :)

un-knight avatar Jul 26 '22 01:07 un-knight

Looking forward to using the new version of NNI :)

Thanks @un-knight for the interactive feedbacks. This will be part of the long term efforts in @J-shang 's Quantization and Pruning plan, we will reference this issue as a use case of the bigger feature in future releases, stay tuned!

scarlett2018 avatar Sep 28 '22 02:09 scarlett2018

Looking forward to using the new version of NNI :)

Thanks @un-knight for the interactive feedbacks. This will be part of the long term efforts in @J-shang 's Quantization and Pruning plan, we will reference this issue as a use case of the bigger feature in future releases, stay tuned!

@QuanluZhang we could consider about these meaningful suggestions during refactoring Quantization and Pruning.

Does NNI v3.0 support multi-gpu training for model compression (pruner, distiller, quantizer)? I found multi-gpu training via DataParallel is supported in v1.4 and here is the example: https://github.com/microsoft/nni/blob/v1.4/examples/model_compress/multi_gpu.py However, I found this DataParallel related issue is still open.

DY-ATL avatar Oct 04 '23 03:10 DY-ATL