coremltools icon indicating copy to clipboard operation
coremltools copied to clipboard

Facing issues while converting a pre-trained PyTorch model to CoreML

Open Prasad-Techlead opened this issue 3 years ago • 2 comments

Hello, I'm trying to convert a pre-trained model from PyTorch to CoreML. I have created a script to achieve the same. I'm able to load and convert the model to TorchScript from both of the methods. (i.e. Tracing and Scripting) However, when calling the coremltools.convert() method for the traced or scripted model it throws an error. I have mentioned the scripts for both methods along with errors thrown.

System Information

MacOS = 12.4 Python = 3.9 protobuf = 3.19.0 coremltools = 6.0b1 torch = 1.10.2 torchvision = 0.11.3

Note - I have tried with multiple versions of the libraries I have mentioned above but that does not help me in any way.

Method 1 -> Tracing

Code -

import coremltools as coremltools
import numpy as np
import torch
import torchvision as torchvision


def do_trace(in_model, in_input):
    model_trace = torch.jit.trace(in_model, in_input)
    model_trace.eval()
    return model_trace


def dict_to_tuple(out_dict):
    if "masks" in out_dict.keys():
        return out_dict["boxes"], out_dict["scores"], out_dict["labels"], out_dict["masks"]
    return out_dict["boxes"], out_dict["scores"], out_dict["labels"]


class PredictionModel(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.model = torchvision.models.detection.keypointrcnn_resnet50_fpn(pretrained=True)

    def forward(self, in_input):
        output = self.model(in_input)
        return dict_to_tuple(output[0])

inp = torch.Tensor(np.random.uniform(0.0, 250.0, size=(1, 3, 300, 300)))

model = PredictionModel().eval()
with torch.no_grad():
    output = model(inp)
    trace_model = do_trace(model, inp)

ml_model = coremltools.convert(trace_model, inputs=[coremltools.TensorType(shape=(1, 3, 300, 300))])
print(ml_model)

Error -

Converting PyTorch Frontend ==> MIL Ops: 3%|▎ | 74/2627 [00:00<00:05, 436.01 ops/s] Traceback (most recent call last): File "/Users/techlead/PycharmProjects/conversion_demo/venv/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/load.py", line 91, in _perform_torch_convert prog = converter.convert() File "/Users/techlead/PycharmProjects/conversion_demo/venv/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/converter.py", line 263, in convert convert_nodes(self.context, self.graph) File "/Users/techlead/PycharmProjects/conversion_demo/venv/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/ops.py", line 89, in convert_nodes add_op(context, node) File "/Users/techlead/PycharmProjects/conversion_demo/venv/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/ops.py", line 3973, in reciprocal context.add(mb.inverse(x=inputs[0], name=node.name)) File "/Users/techlead/PycharmProjects/conversion_demo/venv/lib/python3.9/site-packages/coremltools/converters/mil/mil/ops/registry.py", line 63, in add_op return cls._add_op(op_cls, **kwargs) File "/Users/techlead/PycharmProjects/conversion_demo/venv/lib/python3.9/site-packages/coremltools/converters/mil/mil/builder.py", line 191, in _add_op new_op.type_value_inference() File "/Users/techlead/PycharmProjects/conversion_demo/venv/lib/python3.9/site-packages/coremltools/converters/mil/mil/operation.py", line 244, in type_value_inference output_vals = self._auto_val(output_types) File "/Users/techlead/PycharmProjects/conversion_demo/venv/lib/python3.9/site-packages/coremltools/converters/mil/mil/operation.py", line 354, in _auto_val builtin_val.val = v File "/Users/techlead/PycharmProjects/conversion_demo/venv/lib/python3.9/site-packages/coremltools/converters/mil/mil/types/type_tensor.py", line 93, in val raise ValueError( ValueError: tensor should have value of type ndarray, got <class 'numpy.float32'> instead

Method 2 -> Scripting

Code -

import coremltools as coremltools
import torch
import torchvision as torchvision

model = torchvision.models.detection.keypointrcnn_resnet50_fpn(pretrained=True)
script_model = torch.jit.script(model)

ml_model = coremltools.convert(script_model, inputs=[coremltools.TensorType(shape=(1, 3, 300, 300))])
print(ml_model)

Error -

WARNING:root:Support for converting Torch Script Models is experimental. If possible you should use a traced model for conversion. Traceback (most recent call last): File "/Applications/PyCharm CE.app/Contents/plugins/python-ce/helpers/pydev/pydevd.py", line 1491, in _exec pydev_imports.execfile(file, globals, locals) # execute the script File "/Applications/PyCharm CE.app/Contents/plugins/python-ce/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile exec(compile(contents+"\n", file, 'exec'), glob, loc) File "/Users/techlead/PycharmProjects/conversion_demo/main.py", line 8, in ml_model = coremltools.convert(script_model, inputs=[coremltools.TensorType(shape=(1, 3, 300, 300))]) File "/Users/techlead/PycharmProjects/conversion_demo/venv/lib/python3.9/site-packages/coremltools/converters/_converters_entry.py", line 426, in convert mlmodel = mil_convert( File "/Users/techlead/PycharmProjects/conversion_demo/venv/lib/python3.9/site-packages/coremltools/converters/mil/converter.py", line 182, in mil_convert return _mil_convert(model, convert_from, convert_to, ConverterRegistry, MLModel, compute_units, **kwargs) File "/Users/techlead/PycharmProjects/conversion_demo/venv/lib/python3.9/site-packages/coremltools/converters/mil/converter.py", line 209, in _mil_convert proto, mil_program = mil_convert_to_proto( File "/Users/techlead/PycharmProjects/conversion_demo/venv/lib/python3.9/site-packages/coremltools/converters/mil/converter.py", line 272, in mil_convert_to_proto prog = frontend_converter(model, **kwargs) File "/Users/techlead/PycharmProjects/conversion_demo/venv/lib/python3.9/site-packages/coremltools/converters/mil/converter.py", line 104, in call return load(*args, **kwargs) File "/Users/techlead/PycharmProjects/conversion_demo/venv/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/load.py", line 51, in load converter = TorchConverter(torchscript, inputs, outputs, cut_at_symbols) File "/Users/techlead/PycharmProjects/conversion_demo/venv/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/converter.py", line 158, in init raw_graph, params_dict = self._expand_and_optimize_ir(self.torchscript) File "/Users/techlead/PycharmProjects/conversion_demo/venv/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/converter.py", line 478, in _expand_and_optimize_ir graph, params_dict = TorchConverter._jit_pass_lower_graph(graph, torchscript) File "/Users/techlead/PycharmProjects/conversion_demo/venv/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/converter.py", line 423, in _jit_pass_lower_graph _lower_graph_block(graph) File "/Users/techlead/PycharmProjects/conversion_demo/venv/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/converter.py", line 402, in _lower_graph_block module = getattr(node_to_module_map[_input], attr_name) KeyError: images.7 defined in (%images.7 : torch.torchvision.models.detection.image_list.ImageList, %targets.31 : Dict(str, Tensor)[]? = prim::TupleUnpack(%405) )

If you try to run any of the above snippets you'll see that the model gets successfully converted to TorchScipt (Trace and Script) but the last step i.e. to convert the torch script model to coreml fails. Please have a look at this issue and let me know how I can move further with this. Also, if I'm doing something wrong (for eg - Passing the inputs wrong) let me know as well in that case. This is me first time doing this, so i'm kind of a noob. Any help is appreciated. Thank you!

Prasad-Techlead avatar Jun 16 '22 11:06 Prasad-Techlead

Your code to convert the traced model looks good to me. Please note that our support for PyTorch Scripting is experimental. So I would definitely recommend using traced models if you can.

I suspect this is a bug in coremltools. Can you get predictions from the traced PyTorch model? If so, it's almost certainly a bug with coremltools.

torchvision.models.detection.keypointrcnn_resnet50_fpn looks like a complicated model. Can you work to isolate the problem? Ideally we would have a simple toy network that reproduces the issue.

TobyRoseman avatar Jun 16 '22 19:06 TobyRoseman

Hi, First of all thank you for your prompt reply. Coming to the topic, to answer your first question, Yes! I'm able to get the predictions for the traced model. So it might be an issue with the coremltools itself.

Adding one point to the discussion - Even if you change the torchvision.models.detection.keypointrcnn_resnet50_fpn model to torchvision.models.detection.fasterrcnn_resnet50_fpn the results are same.

Finally, Yes I'll try to come up with a small toy network to reproduce the issue but meanwhile can we find any workaround to convert the model?

Thank you!

Prasad-Techlead avatar Jun 17 '22 09:06 Prasad-Techlead

Since we have not received steps to reproduce this problem, I'm going to close it. If we get steps to reproduce it, I'm happy to reopen.

TobyRoseman avatar Dec 12 '22 19:12 TobyRoseman

@TobyRoseman Here's a small self-contained script to reproduce it. torch 1.13, coremltools 6.2.

import torch
import coremltools as ct

@torch.jit.script
def _padded_size(x):
    """Pytorch cannot trace x.shape[] code. We therefore create a little TorchScript function."""
    padded_size = 2 ** torch.ceil(torch.log2(torch.tensor(x.size(-1) - 1).to(torch.float)))
    return padded_size.int()

class MyModel(torch.nn.Module):
    def __init__(self):
        super().__init__()

    def forward(self, x):
        return torch.zeros((1, _padded_size(x)))

model = MyModel().eval()


with torch.inference_mode():
    example_input = torch.rand(10, 1000)
    traced_model = torch.jit.trace(model, example_input)

coreml_model = ct.convert(
    traced_model,
    convert_to='neuralnetwork',  # or 'mlprogram', same result
    inputs=[ct.TensorType(shape=example_input.shape, name='x')],
    debug=True
)

This line is the crux:

padded_size = 2 ** torch.ceil(torch.log2(torch.tensor(x.size(-1) - 1).to(torch.float)))

Without the .to(...) statement, torch tries to take a log2 from an integer and fails. I've also tried torch.float32, but that fails with the same error as torch.float.

tcwalther avatar Feb 24 '23 16:02 tcwalther

Thank you for the nice code snippet to reproduce the issue. I can reproduce it as well.

I see that there are constants in the torch graph which are of size 0 or empty shape, that are causing this error in the coremltools code. I'm reopening this issue, as it needs further investigation.

aseemw avatar Feb 24 '23 17:02 aseemw

I actually managed to find a workaround: if I replace .to(torch.float) with .type_as(x), it works fine :).

tcwalther avatar Feb 26 '23 16:02 tcwalther

I'm having the same issue as @Prasad-Techlead mentions. I'm able to script and trace Torchvision models and get correct predictions, but when converting to CoreML, I get the error images.7 defined in (%images.7 : __torch__.torchvision.models.detection.image_list.ImageList, %targets.31 : Dict(str, Tensor)[]? = prim::TupleUnpack(%396)\n)",

I've tried multiple networks: retinanet_resnet50_fpn_v2, fasterrcnn_resnet50_fpn_v2 and ssdlite320_mobilenet_v3_large.

Has anyone managed to convert any of Torchvision's object detection models to CoreML?

lukasugar avatar Mar 03 '23 01:03 lukasugar

Using tip of main, the code still does not work but the error message and stack trace has changed:

ValueError: tensor should have value of type ndarray, got instead
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[1], line 24
     21     example_input = torch.rand(10, 1000)
     22     traced_model = torch.jit.trace(model, example_input)
---> 24 coreml_model = ct.convert(
     25     traced_model,
     26     convert_to='neuralnetwork',  # or 'mlprogram', same result
     27     inputs=[ct.TensorType(shape=example_input.shape, name='x')],
     28     debug=True
     29 )

File ~/miniconda3/envs/prod/lib/python3.10/site-packages/coremltools/converters/_converters_entry.py:530, in convert(model, source, inputs, outputs, classifier_config, minimum_deployment_target, convert_to, compute_precision, skip_model_load, compute_units, package_dir, debug, pass_pipeline)
    527 if specification_version is None:
    528     specification_version = _set_default_specification_version(exact_target)
--> 530 mlmodel = mil_convert(
    531     model,
    532     convert_from=exact_source,
    533     convert_to=exact_target,
    534     inputs=inputs,
    535     outputs=outputs_as_tensor_or_image_types,  # None or list[ct.ImageType/ct.TensorType]
    536     classifier_config=classifier_config,
    537     skip_model_load=skip_model_load,
    538     compute_units=compute_units,
    539     package_dir=package_dir,
    540     debug=debug,
    541     specification_version=specification_version,
    542     main_pipeline=pass_pipeline,
    543 )
    545 if exact_target == "mlprogram" and mlmodel._input_has_infinite_upper_bound():
    546     raise ValueError(
    547         "For mlprogram, inputs with infinite upper_bound is not allowed. Please set upper_bound"
    548         ' to a positive value in "RangeDim()" for the "inputs" param in ct.convert().'
    549     )

File ~/miniconda3/envs/prod/lib/python3.10/site-packages/coremltools/converters/mil/converter.py:188, in mil_convert(model, convert_from, convert_to, compute_units, **kwargs)
    149 @_profile
    150 def mil_convert(
    151     model,
   (...)
    155     **kwargs
    156 ):
    157     """
    158     Convert model from a specified frontend `convert_from` to a specified
    159     converter backend `convert_to`.
   (...)
    186         See `coremltools.converters.convert`
    187     """
--> 188     return _mil_convert(model, convert_from, convert_to, ConverterRegistry, MLModel, compute_units, **kwargs)

File ~/miniconda3/envs/prod/lib/python3.10/site-packages/coremltools/converters/mil/converter.py:212, in _mil_convert(model, convert_from, convert_to, registry, modelClass, compute_units, **kwargs)
    209     weights_dir = _tempfile.TemporaryDirectory()
    210     kwargs["weights_dir"] = weights_dir.name
--> 212 proto, mil_program = mil_convert_to_proto(
    213                         model,
    214                         convert_from,
    215                         convert_to,
    216                         registry,
    217                         **kwargs
    218                      )
    220 _reset_conversion_state()
    222 if convert_to == 'milinternal':

File ~/miniconda3/envs/prod/lib/python3.10/site-packages/coremltools/converters/mil/converter.py:286, in mil_convert_to_proto(model, convert_from, convert_to, converter_registry, main_pipeline, **kwargs)
    281 frontend_pipeline, backend_pipeline = _construct_other_pipelines(
    282     main_pipeline, convert_from, convert_to
    283 )
    285 frontend_converter = frontend_converter_type()
--> 286 prog = frontend_converter(model, **kwargs)
    287 PassPipelineManager.apply_pipeline(prog, frontend_pipeline)
    289 PassPipelineManager.apply_pipeline(prog, main_pipeline)

File ~/miniconda3/envs/prod/lib/python3.10/site-packages/coremltools/converters/mil/converter.py:108, in TorchFrontend.__call__(self, *args, **kwargs)
    105 def __call__(self, *args, **kwargs):
    106     from .frontend.torch.load import load
--> 108     return load(*args, **kwargs)

File ~/miniconda3/envs/prod/lib/python3.10/site-packages/coremltools/converters/mil/frontend/torch/load.py:63, in load(model_spec, inputs, specification_version, debug, outputs, cut_at_symbols, **kwargs)
     55 inputs = _convert_to_torch_inputtype(inputs)
     56 converter = TorchConverter(
     57     torchscript,
     58     inputs,
   (...)
     61     specification_version,
     62 )
---> 63 return _perform_torch_convert(converter, debug)

File ~/miniconda3/envs/prod/lib/python3.10/site-packages/coremltools/converters/mil/frontend/torch/load.py:102, in _perform_torch_convert(converter, debug)
    100 def _perform_torch_convert(converter, debug):
    101     try:
--> 102         prog = converter.convert()
    103     except RuntimeError as e:
    104         if debug and "convert function" in str(e):

File ~/miniconda3/envs/prod/lib/python3.10/site-packages/coremltools/converters/mil/frontend/torch/converter.py:439, in TorchConverter.convert(self)
    436 self.convert_const()
    438 # Add the rest of the operations
--> 439 convert_nodes(self.context, self.graph)
    441 graph_outputs = [self.context[name] for name in self.graph.outputs]
    443 # An output can be None when it's a None constant, which happens
    444 # in Fairseq MT.

File ~/miniconda3/envs/prod/lib/python3.10/site-packages/coremltools/converters/mil/frontend/torch/ops.py:92, in convert_nodes(context, graph)
     87         raise RuntimeError(
     88             f"PyTorch convert function for op '{node.kind}' not implemented."
     89         )
     91 context.prepare_for_conversion(node)
---> 92 add_op(context, node)
     94 # We've generated all the outputs the graph needs, terminate conversion.
     95 if _all_outputs_present(context, graph):

File ~/miniconda3/envs/prod/lib/python3.10/site-packages/coremltools/converters/mil/frontend/torch/ops.py:5283, in log2(context, node)
   5281 inputs = _get_inputs(context, node)
   5282 x = inputs[0]
-> 5283 log_x = mb.log(x=x)
   5284 context.add(mb.mul(x=log_x, y=1 / _np.log(2.0)), node.name)

File ~/miniconda3/envs/prod/lib/python3.10/site-packages/coremltools/converters/mil/mil/ops/registry.py:183, in SSAOpRegistry.register_op.<locals>.class_wrapper.<locals>.add_op(cls, **kwargs)
    180 else:
    181     op_cls_to_add = op_reg[op_type]
--> 183 return cls._add_op(op_cls_to_add, **kwargs)

File ~/miniconda3/envs/prod/lib/python3.10/site-packages/coremltools/converters/mil/mil/builder.py:182, in Builder._add_op(cls, op_cls, **kwargs)
    180 curr_block()._insert_op_before(new_op, before_op=before_op)
    181 new_op.build_nested_blocks()
--> 182 new_op.type_value_inference()
    183 if len(new_op.outputs) == 1:
    184     return new_op.outputs[0]

File ~/miniconda3/envs/prod/lib/python3.10/site-packages/coremltools/converters/mil/mil/operation.py:256, in Operation.type_value_inference(self, overwrite_output)
    254 if not isinstance(output_types, tuple):
    255     output_types = (output_types,)
--> 256 output_vals = self._auto_val(output_types)
    257 try:
    258     output_names = self.output_names()

File ~/miniconda3/envs/prod/lib/python3.10/site-packages/coremltools/converters/mil/mil/operation.py:396, in Operation._auto_val(self, output_types)
    394         builtin_val.val = v.ls
    395     else:
--> 396         builtin_val.val = v
    397     auto_val.append(builtin_val)
    398 return auto_val

File ~/miniconda3/envs/prod/lib/python3.10/site-packages/coremltools/converters/mil/mil/types/type_tensor.py:88, in tensor.<locals>.tensor.val(self, v)
     85 @val.setter
     86 def val(self, v):
     87     if not isinstance(v, np.ndarray):
---> 88         raise ValueError(
     89             "tensor should have value of type ndarray, got {} instead".format(
     90                 type(v)
     91             )
     92         )
     94     v_type = numpy_type_to_builtin_type(v.dtype)
     95     promoted_type = promote_types(v_type, primitive)

ValueError: tensor should have value of type ndarray, got <class 'numpy.float32'> instead

TobyRoseman avatar Jul 31 '23 20:07 TobyRoseman