coremltools icon indicating copy to clipboard operation
coremltools copied to clipboard

Can't figure out NSLocalizedDescription = "Error in declaring network.";

Open simicvm opened this issue 2 years ago • 6 comments

I have two PyTorch models, one text Transformer and one image Transformer. For both models I used ct.TensorType as input.

Both models were converted successfully on the following platform:

  • Ubuntu 22.04 LTS
  • Python 3.9.13
  • PyTorch 1.10.1
  • coremltools 6.02b

Now, I am trying to import the CoreML models on MacBook Pro 2021 M1

  • macOS 12.6
  • Python3.9.13
  • PyTorch 1.12.1
  • coremltools 6.02b

Text Transformer works fine and produces identical results to its PyTorch counterpart.

Image Transformer model is giving me the NSLocalizedDescription = "Error in declaring network."; error on load. I can't figure out what's the issue.

I know that PyTorch 1.12.1 is not supported, but I am just doing inference with the already converted model, and PyTorch is not being used in this case.

All tensors are less than rank 5.

This is the Image Transformer model:

def forward(self, x: torch.Tensor):
        x = self.conv1(x)  # shape = [*, width, grid, grid]
        x = x.reshape(x.shape[0], x.shape[1], -1)  # shape = [*, width, grid ** 2]
        x = x.permute(0, 2, 1)  # shape = [*, grid ** 2, width]
        x = torch.cat(
            [self.class_embedding.to(x.dtype) + torch.zeros(x.shape[0], 1, x.shape[-1], dtype=x.dtype, device=x.device),
             x], dim=1)  # shape = [*, grid ** 2 + 1, width]
        x = x + self.positional_embedding.to(x.dtype)
        x = self.ln_pre(x)

        x = x.permute(1, 0, 2)  # NLD -> LND
        x = self.transformer(x)
        x = x.permute(1, 0, 2)  # LND -> NLD

        x = self.ln_post(x[:, 0, :])

        if self.proj is not None:
            x = x @ self.proj

        return x

where self.transformer is this:

class Transformer(nn.Module):
    def __init__(self, width: int, layers: int, heads: int,  mlp_ratio: float = 4.0, act_layer: Callable = nn.GELU):
        super().__init__()
        self.width = width
        self.layers = layers
        self.grad_checkpointing = False

        self.resblocks = nn.ModuleList([
            ResidualAttentionBlock(width, heads, mlp_ratio, act_layer=act_layer)
            for _ in range(layers)
        ])

    def forward(self, x: torch.Tensor, attn_mask: Optional[torch.Tensor] = None):
        for r in self.resblocks:
            if self.grad_checkpointing and not torch.jit.is_scripting():
                x = checkpoint(r, x, attn_mask)
            else:
                x = r(x, attn_mask=attn_mask)
        return x

simicvm avatar Sep 17 '22 12:09 simicvm

There is also a big difference in execution time between PyTorch and CoreML text models.

PyTorch: 0.27843 seconds CoreML: 7.80541 seconds

simicvm avatar Sep 17 '22 12:09 simicvm

We just released coremltools 6.0. This version does support PyTorch 1.12.1.

Are you converting to neuralnetwork or mlprogram? If you're converting to neuralnetwork or not specifying that value, try using mlprogram, i.e. add convert_to='mlprogram' to your coremltools.convert call.

If you're still getting this error when using both coremltools 6.0 and mlprogram, please share complete details to reproduce the issue.

TobyRoseman avatar Sep 20 '22 15:09 TobyRoseman

There is also a big difference in execution time between PyTorch and CoreML text models.

PyTorch: 0.27843 seconds CoreML: 7.80541 seconds

Are you measuring performance on the same machine with the same device (i.e. CPU vs GPU)? If you think there is a problem here, please open a new issue. Please include complete steps to reproduce the conversion, as well as complete step to get predictions from both models.

TobyRoseman avatar Sep 20 '22 15:09 TobyRoseman

We just released coremltools 6.0. This version does support PyTorch 1.12.1.

Are you converting to neuralnetwork or mlprogram? If you're converting to neuralnetwork or not specifying that value, try using mlprogram, i.e. add convert_to='mlprogram' to your coremltools.convert call.

If you're still getting this error when using both coremltools 6.0 and mlprogram, please share complete details to reproduce the issue.

I was converting to mlprogram. Let me try coremltools 6.0 and get back to you.

Are you measuring performance on the same machine with the same device (i.e. CPU vs GPU)? If you think there is a problem here, please open a new issue. Please include complete steps to reproduce the conversion, as well as complete step to get predictions from both models.

Yes, I am measuring the performance on the same machine, MacBook Pro M1. I’ll check if anything changes with new coremltools.

simicvm avatar Sep 20 '22 22:09 simicvm

This is the result after using coremltools 6.0:

Machine running the conversion: MacBook Pro 2021 M1

  • macOS 12.6
  • Python3.9.13
  • PyTorch 1.12.1
  • coremltools 6

Conversion commands

# Image Transformer
image_encoder_model = ct.convert(
    traced_image_encoder,
    convert_to="mlprogram",
    compute_precision=ct.precision.FLOAT32,
    inputs=[ct.TensorType(shape=example_image.shape, name="image")]
)

# Text Transformer
text_encoder_model = ct.convert(
    traced_text_encoder,
    convert_to="mlprogram",
    compute_precision=ct.precision.FLOAT32,
    inputs=[ct.TensorType(shape=example_text.shape, name="text")]
)

Output of the conversion process for the Image Transformer:

Converting PyTorch Frontend ==> MIL Ops: 100%|██████████████████████████████████████████████████████▉| 2035/2036 [00:00<00:00, 4953.12 ops/s]
Running MIL Common passes:   5%|███▊                                                                     | 2/38 [00:00<00:04,  7.33 passes/s]/Users/marko/miniconda3/envs/pytorch_arm/lib/python3.9/site-packages/coremltools/converters/mil/mil/passes/name_sanitization_utils.py:129: UserWarning: Output, '2818', of the source model, has been renamed to 'var_2818' in the Core ML model.
  warnings.warn(msg.format(var.name, new_name))
Running MIL Common passes: 100%|████████████████████████████████████████████████████████████████████████| 38/38 [00:01<00:00, 25.72 passes/s]
Running MIL Clean up passes: 100%|██████████████████████████████████████████████████████████████████████| 11/11 [00:01<00:00, 10.25 passes/s]
/Users/marko/miniconda3/envs/pytorch_arm/lib/python3.9/site-packages/coremltools/models/model.py:156: RuntimeWarning: You will not be able to run predict() on this Core ML model. Underlying exception message was: {
    NSLocalizedDescription = "Error in declaring network.";
}
  _warnings.warn(

Output of the conversion process for the Text Transformer:

Converting PyTorch Frontend ==> MIL Ops:   4%|██▍                                                      | 66/1565 [00:00<00:00, 7362.73 ops/s]
Traceback (most recent call last):
  File "/Users/marko/code-snippets/python/convert-openclip-coreml.py", line 70, in <module>
    convert_openclip_coreml()
... #truncated error stack for brevity
File "/Users/marko/miniconda3/envs/pytorch_arm/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/ops.py", line 92, in convert_nodes
    raise RuntimeError(
RuntimeError: PyTorch convert function for op 'baddbmm' not implemented.

Summary:

  • Image Transformer: model was converted, but I got the same error message as before just this time thrown during/after the conversion process.
  • Text Transformer: model was not converted due to unsupported PyTorch operation. However, coremltools6.02b was able to convert this model before, so I suspect you removed the support for baddbmm operation in the stable 6.0 version.

simicvm avatar Sep 20 '22 23:09 simicvm

We did not remove support for torch.baddbmm. We never supported it. We're already tracking that issue in #1555. It seems the new version of PyTorch uses this op more often than before.

In order to make progress with the image transformer, we need to be able to reproduce the problem. Ideally we'd have a minimal example to reproduce the issue. Can you share standalone code to reproduce your Image Transformer conversion issue?

TobyRoseman avatar Sep 21 '22 18:09 TobyRoseman

We did not remove support for torch.baddbmm. We never supported it. We're already tracking that issue in #1555. It seems the new version of PyTorch uses this op more often than before.

That is strange. As you can see from my first message, on the coremltools6.02b I was able to convert that Text Transformer, use it for inference, and validate the correctness with the PyTorch model. After upgrading to coremltools6 that conversion now fails, but I haven't changed the models at all.

In order to make progress with the image transformer, we need to be able to reproduce the problem. Ideally we'd have a minimal example to reproduce the issue. Can you share standalone code to reproduce your Image Transformer conversion issue?

I'll post the code a bit later.

simicvm avatar Sep 22 '22 00:09 simicvm

Since we have not heard back here, I'm going to close this issue. If we get the code to reproduce this issue, I'll reopen it.

TobyRoseman avatar Nov 14 '22 21:11 TobyRoseman