coremltools icon indicating copy to clipboard operation
coremltools copied to clipboard

What is _ctx_tx_to_fallback_0 in context_transfer type ?

Open quqixun opened this issue 2 years ago • 1 comments

❓Question

I am not sure if this question is about coremltools or the Core ML Frame work or Xcode.

Here is an example to convert a simple pytorch model to coreml model.

import torch
import torch.nn as nn
import coremltools as ct


class SimpleModel(nn.Module):

    def __init__(self,):
        super(SimpleModel, self).__init__()

        self.weight = nn.Parameter(torch.randn(8, 8, 3, 3))

    def forward(self, x):
        return x * self.weight


if __name__ == '__main__':

    model_pytorch = SimpleModel()
    model_pytorch.eval()

    input_example = torch.randn(8, 8, 1, 1)
    model_traced = torch.jit.trace(model_pytorch, input_example)

    model_coreml = ct.convert(
        model_traced,
        convert_to    = 'neuralnetwork',
        source        = 'pytorch',
        inputs        = [ct.TensorType(name='x', shape=[8, 8, 1, 1])],
        outputs       = [ct.TensorType(name='output')],
        compute_units = ct.ComputeUnit.ALL,
    )
    model_coreml.save('SimpleModel.mlmodel')

Then, open the SimpleModel.mlmodel in XCode and generate a performance report as shown below:

simplemodel_performance_report

The Questions are:

  • What is _ctx_tx_to_fallback_0 in context_transfer type ? I found nothing about _ctx_tx_to_fallback_0 or context_transfer in Core ML documentation.
  • How to modify the code to make this operation work on Neural Engine ?

Environment:

  • macOS Ventura 13.5.2
  • XCode 14.3.1
  • Python 3.8.17
  • coremltools 7.0b2
  • torch 2.0.0

quqixun avatar Sep 14 '23 03:09 quqixun

The first op likely refers to the operation capturing the input data transfer process to the execution engine. The gray tick shows that the op should be supported on the neural engine (NE). Since this is such a tiny model and there is some overhead of going to the NE and coming out of it back to the CPU, the scheduler likely decided to stay on the CPU.

You may want to test with a model with a long sequence of elementwise ops or more computationally expensive ops like matmul, to justify the scheduler to go to the NE.

aseemw avatar Sep 14 '23 17:09 aseemw