coremltools Error when applying 8-bits quantization on Object Detection model from Create ML using Transfer Learning

🐞Describing the bug

I've been trying to reduce the size of my Object Detection ML model generated by Xcode tool CreateML with Transfer Learning. I found out that the transfer learning mlmodel output from Create ML is a Mixed(Float32, Float16) weights format.

I could successfully quantize this file to Float16 format which saved 0.3MB already but I am not able to quantize the original model nor the Float16 one down to 8-bits. I get a numpy error: ValueError: operands could not be broadcast together with shapes (0,1) (128,0)

-> Is there a workaround to that error such as converting the Mixed(Float32, Float16) file up to full precision Float32 prior to 8-bits quantization?

Stack Trace

Quantizing using linear quantization
Optimizing Neural Network before Quantization:
Traceback (most recent call last):
  File "/Users/username/Desktop/COMPRESSION/compression.py", line 5, in <module>
    quantized_model = quantization_utils.quantize_weights(mlmodel, 8)
  File "/usr/local/lib/python3.9/site-packages/coremltools/models/neural_network/quantization_utils.py", line 1642, in quantize_weights
    qspec = _quantize_spec_weights(spec, nbits, qmode, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/coremltools/models/neural_network/quantization_utils.py", line 1128, in _quantize_spec_weights
    _quantize_spec_weights(model_spec, nbits, quantization_mode, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/coremltools/models/neural_network/quantization_utils.py", line 1113, in _quantize_spec_weights
    _quantize_nn_spec(spec.neuralNetwork, nbits, quantization_mode, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/coremltools/models/neural_network/quantization_utils.py", line 723, in _quantize_nn_spec
    _optimize_nn(layers)
  File "/usr/local/lib/python3.9/site-packages/coremltools/models/neural_network/optimization_utils.py", line 213, in _optimize_nn
    _conv_bn_fusion(int(conv_idx), int(output_idx), layers)
  File "/usr/local/lib/python3.9/site-packages/coremltools/models/neural_network/optimization_utils.py", line 127, in _conv_bn_fusion
    wp = (gamma / _np.sqrt(variance))[:, None] * w
ValueError: operands could not be broadcast together with shapes (0,1) (128,0)

To Reproduce

Run this script on an object detection ML model generated with CreateML using transfer learning.

import coremltools as ct
from coremltools.models.neural_network import quantization_utils
mlmodel = ct.models.MLModel('objectDetection.mlmodel')
quantized_model = quantization_utils.quantize_weights(mlmodel, 8)
quantized_model.save("objectDetection-8bits.mlmodel")

System environment:

coremltools version: 6.1

Nov 30 '22 06:11 chillyjee

Can you share your objectDetection.mlmodel?

Nov 30 '22 20:11 TobyRoseman

Here is the model: quantization.zip

Dec 01 '22 09:12 chillyjee

I'm facing with the same issue, error happens for both Transfer Learning and Full Network models.

Feb 04 '25 12:02 alpaycli