coremltools coremltools 8.0b1 linear_quantize_weights is not working for models converted with minimum_deployment

🐞Describing the bug

Make sure to only create an issue here for bugs in the coremltools Python package. If this is a bug with the Core ML Framework or Xcode, please submit your bug here: https://developer.apple.com/bug-reporting/
Provide a clear and consise description of the bug.

I'm testing 8.0 beta 1 for linear quantization , if model is converted with with minimum_deployment_target=ct.target.iOS18, linear_quantize_weights does not perform weight quantization ( checking by loading model in Xcode, storage remains Float16, Computer missing int8

if I change minimum_deployment_target to ios17 quantization is correct, Xcode shows models with int8

Stack Trace

If applicable, please paste the complete stack trace.

To Reproduce

Please add a minimal code example that can reproduce the error when running it.

#iOS18 TEST linear quant int8, int4
import torch
import torch.nn as nn
import torch.nn.functional as F
import coremltools as ct
import coremltools.optimize as cto
import numpy as np

SIZE = 224
# Set the seed for reproducibility
seed = 42

# Define a simple layer module we'll reuse in our network.
class Layer(nn.Module):
    def __init__(self, in_channels: int, out_channels: int):
        super(Layer, self).__init__()
        self.linear = nn.Linear(in_channels, out_channels)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        batch_size, channels, height, width = x.shape
        x = x.view(batch_size, channels, -1)  # Flatten the height and width
        x = x.permute(0, 2, 1)  # Rearrange for linear layer
        x = self.linear(x)
        x = x.permute(0, 2, 1).reshape(batch_size, -1, height, width)  # Reshape back to the original dimensions
        x = F.relu(x)
        x = F.max_pool2d(x, (2, 2))
        return x

# A simple network consisting of several base layers.
class SimpleNet(nn.Module):
    def __init__(self):
        super(SimpleNet, self).__init__()
        self.layer1 = Layer(3, 6)
        self.layer2 = Layer(6, 16)
        self.classifier = nn.Linear(16 * 56 * 56, 8)  # Initialize the classifier with correct input size

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        x = self.layer1(x)
        x = self.layer2(x)
        x = 5.0 * x
        x = 6.3 * x
        x = x.reshape(x.size(0), -1)  # Flatten the tensor before the classifier
        x = self.classifier(x)
        x = F.log_softmax(x, dim=1)  # Use log_softmax for classification
        return x

# Create the model instance
model = SimpleNet()
model.eval()  # Set the model to evaluation mode

# Prepare example input
example_input = torch.randn(1, 3, SIZE, SIZE)

# Trace the model
print("Trace the model")
try:
    traced_model = torch.jit.trace(model, example_input)
except Exception as e:
    print(f"Tracing failed: {e}")
    print("Attempting to script the model instead")
    traced_model = torch.jit.script(model)

print("Convert to CoreML")
coreml_model = ct.convert(
    traced_model,
    minimum_deployment_target=ct.target.iOS17,
    inputs=[ct.ImageType(name="input_1", shape=example_input.shape)],
)


print("Quantize model -8")
config = cto.coreml.OptimizationConfig()
global_config = cto.coreml.OpLinearQuantizerConfig(
    mode="linear_symmetric", dtype=np.int8, weight_threshold=127
)
config.set_global(global_config)
Xmodel = cto.coreml.linear_quantize_weights(coreml_model, config)

print("Saving models")
Xmodel.save("newmodel-A81-8.mlpackage")
coreml_model.save("newmodel-A81-16.mlpackage")


print("Convert to CoreML 8.0/iOS18")
coreml_model = ct.convert(
    traced_model,
    minimum_deployment_target=ct.target.iOS18,
    inputs=[ct.ImageType(name="input_1", shape=example_input.shape)],
)

print("Quantize model-4")
config = cto.coreml.OptimizationConfig()
global_config = cto.coreml.OpLinearQuantizerConfig(
    mode="linear_symmetric", dtype=ct.converters.mil.mil.types.int8, weight_threshold=64
)

config.set_global(global_config)
Xmodel = cto.coreml.linear_quantize_weights(coreml_model, config)
Xmodel.save("newmodel-A81-4.mlpackage")


print("CoreML model saved")

# Paste Python code snippet here, complete with any required import statements.

If the model conversion succeeds, but there is a numerical mismatch in predictions, please include the code used for comparisons.

System environment (please complete the following information):

coremltools version:
OS (e.g. MacOS version or Linux type):
Any other relevant version information (e.g. PyTorch or TensorFlow version): macOS 15 beta

Additional context

Add anything else about the problem here that you want to share.

Jun 21 '24 01:06 dessatel

@dessatel, Thank you so much for reporting this issue with the detailed steps.

I confirmed that I can reproduce this. Just want to mention that if you check the quantized model disk size, it matches the iOS17 version. It's more like a Xcode display issue and we are working on it. Thanks!

Jun 21 '24 17:06 junpeiz

Thank you , could you confirm int4 works as well ? iOS8 target is required for sure in this case. This was the main motivation for the test, and it was producing the same issue. Is there way to check model's supported data types via some meta-data call from coremltools ?

Jun 21 '24 17:06 dessatel

You could use the model size to check if the int4 is actually used (for example, it should have around 1/2 of the int8 model size).

Jun 21 '24 19:06 junpeiz

Indeed, size is 1/2 for 4-bit lin quant, and MD5 for weight.bin is the same for iOS17 and iOS18 export for 8 bt. Exporting model's Specs I can see INT4 reference for 4-bit quant. So Xcode issue is likely value { arguments { value { type { tensorType { dataType: INT4 rank: 2 dimensions { constant { size: 16 } } dimensions { constant { size: 6 } } } } blobFileValue { fileName: "@model_path/weights/weight.bin" offset: 192 } }

Jun 22 '24 07:06 dessatel

FB14257215 Xcode beta 3 has the same issue

Jul 10 '24 00:07 dessatel

This issue is resolved in Xcode beta 4, both Int4 and int8 are showing up correctly Version 16.0 beta 4 (16A5211f) FB14257215

Jul 24 '24 02:07 dessatel

coremltools 8.0b1 linear_quantize_weights is not working for models converted with minimum_deployment_target=ct.target.iOS18

🐞Describing the bug

Stack Trace

To Reproduce

System environment (please complete the following information):

Additional context