coremltools Difference between CoreML and Pytorch inference

Hi!

I am trying to understand the discrepancy between the CoreML and the PyTorch inference we have. I have at least 10% differences for specific heads and test cases and I need to find the reason.

This is a screenshot of model's input using Netron app:

The MUL and ADD are vector, (The current CoreML does not support vectors for Bias operation, however I changed it, see the issue here: https://github.com/apple/coremltools/issues/2619)

The values for MUL and ADD are: MUL: 0.01464911736547947, 0.015123673714697361, 0.015288766473531723 ADD: -1.8590586185455322, -1.7242575883865356, -1.5922027826309204

I am adding the code I used to do inference on development environment:

def test_on_single_image(multihead_model: nn.Module, image_path: Path) -> List[float]:
    """
    Default method to test the model on a single image.
    """
    img_bgr = cv2.imread(image_path)
    if img_bgr is None:
        raise RuntimeError(f"Could not read image: {image_path}")
    img_rgb = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB)
    img_rgb = cv2.resize(img_rgb, (224, 224), cv2.INTER_LINEAR)
    # Convert to tensor and add batch dimension
    img_tensor = (
        torch.from_numpy(img_rgb).permute(2, 0, 1).unsqueeze(0)
    )  # Shape: 1,3,H,W
    img_tensor = img_tensor.float() / 255.0
    for c in range(3):
        img_tensor[0, c, :, :] = (img_tensor[0, c, :, :] - MEAN[c]) / STD[c]

    outputs = multihead_model(img_tensor)
    predicted = outputs
    return predicted

The values for MEAN and STD are: MEAN = [0.49767, 0.4471, 0.4084] STD = [0.2677, 0.2593, 0.2565]

img_tensor[0, 0, 0, 0] = 1.2319, which is same with the colorImage__biased__ (It is 1.2319052), however the second value is 1.1586595 which is different than img_tensor[0,0,0,1], which is 1.0561.

I compare the order of colorImage__biased__ and the img_tensor to check whether the discrepancy comes from row/channel order but they are not. The values img_tensor[0,1,0,1] and img_tensor[0,1,0, 1] are also different.

The only thing I belive can be different is the interpolation method used by CoreML. The OpenCV uses cv2.INTER_LINEAR, but I do not know what apple inference uses. I changed my interpolation method to different algorithms but the results are still different.

What are the possible reasons for this discrepancy?

The input image used in the example is:

Nov 09 '25 15:11 onurtore

Take a look at our Model Debugging User Guide Page.

Nov 14 '25 16:11 TobyRoseman

Take a look at our Model Debugging User Guide Page.

Thanks, will look into that, but can Apple share the name of the resizing algorithm? Is that a possibility?

Nov 15 '25 17:11 onurtore