coremltools icon indicating copy to clipboard operation
coremltools copied to clipboard

RuntimeError: PyTorch convert function for op '_weight_norm' not implemented.

Open devalexqt opened this issue 4 years ago • 16 comments

Try to convert PyTorch model to coreml on mac OS v12 M1 but got error:

model = ct.convert(
    "./model.pt",
    inputs=[ct.TensorType(shape=(1,3,256,256))],
    source="pytorch"
)
    model = ct.convert(
  File "/Users/alex/miniforge3/envs/tf25_m1/lib/python3.9/site-packages/coremltools/converters/_converters_entry.py", line 326, in convert
    mlmodel = mil_convert(
  File "/Users/alex/miniforge3/envs/tf25_m1/lib/python3.9/site-packages/coremltools/converters/mil/converter.py", line 182, in mil_convert
    return _mil_convert(model, convert_from, convert_to, ConverterRegistry, MLModel, compute_units, **kwargs)
  File "/Users/alex/miniforge3/envs/tf25_m1/lib/python3.9/site-packages/coremltools/converters/mil/converter.py", line 209, in _mil_convert
    proto, mil_program = mil_convert_to_proto(
  File "/Users/alex/miniforge3/envs/tf25_m1/lib/python3.9/site-packages/coremltools/converters/mil/converter.py", line 300, in mil_convert_to_proto
    prog = frontend_converter(model, **kwargs)
  File "/Users/alex/miniforge3/envs/tf25_m1/lib/python3.9/site-packages/coremltools/converters/mil/converter.py", line 104, in __call__
    return load(*args, **kwargs)
  File "/Users/alex/miniforge3/envs/tf25_m1/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/load.py", line 50, in load
    return _perform_torch_convert(converter, debug)
  File "/Users/alex/miniforge3/envs/tf25_m1/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/load.py", line 95, in _perform_torch_convert
    raise e
  File "/Users/alex/miniforge3/envs/tf25_m1/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/load.py", line 87, in _perform_torch_convert
    prog = converter.convert()
  File "/Users/alex/miniforge3/envs/tf25_m1/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/converter.py", line 239, in convert
    convert_nodes(self.context, self.graph)
  File "/Users/alex/miniforge3/envs/tf25_m1/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/ops.py", line 73, in convert_nodes
    raise RuntimeError(
RuntimeError: PyTorch convert function for op '_weight_norm' not implemented.

devalexqt avatar Nov 17 '21 13:11 devalexqt

@devalexqt - can you construct a simple PyTorch model that when converted gives this error?

TobyRoseman avatar Nov 17 '21 20:11 TobyRoseman

I try to convert https://github.com/jantic/DeOldify to coreML format:

trace=torch.jit.trace(model, dummy_input)
out=trace(dummy_input)
torch.jit.save(trace,"./models/model.pt")

devalexqt avatar Nov 17 '21 21:11 devalexqt

@devalexqt - it looks like there are a few different models associated with that project. Can you provide a direct link to the model you're using?

I can't find any information about _weight_norm. Does anyone have information about it? Is it same as weight_norm?

TobyRoseman avatar Nov 19 '21 21:11 TobyRoseman

I'm try "Completed Generator Weights" first one "Artistic" https://data.deepai.org/deoldify/ColorizeArtistic_gen.pth

devalexqt avatar Nov 19 '21 22:11 devalexqt

Any update on this ? I'm trying to export a UNet model trained with FastAI and having same issue above with the _weight_norm op/layer. It's my understanding that it's same as weight_norm and I tried to remove it with remove_weight_norm() without success. Any help ?

JacopoMangiavacchi avatar Dec 14 '21 02:12 JacopoMangiavacchi

It looks like _weight_norm is what weight_norm gets lowered to. Here is a minimal example to reproduce the problem:

import coremltools as ct
import torch 

m = torch.nn.utils.weight_norm(torch.nn.Linear(20, 40))
m = torch.jit.trace(m, torch.randn(20,))
ct.convert(m, inputs=[ct.TensorType(shape=(20,))])

I'm going to reopen this issue.

TobyRoseman avatar Dec 15 '21 21:12 TobyRoseman

I just deleted it everywhere and model start working.

devalexqt avatar Dec 16 '21 00:12 devalexqt

I can't unfortunately call remove_weight_norm() or change the module architecture as I only have the PyTorch script of the PyTorch model I want to export. Anyway, I was able to export this UNet model with BatchNorm2d layers registering the following op:

@register_torch_op
def _weight_norm(context, node):
    inputs = _get_inputs(context, node, expected=3)
    xy = mb.mul(x=inputs[0], y=inputs[1])
    context.add(xy, node.name)

Unfortunately even if successfully exported the coreml model is not really working as expected. May someone please review if the matrix multiplication above is a correct implementation for the weight normalization while inferencing ? Thanks so much!!!

JacopoMangiavacchi avatar Dec 20 '21 03:12 JacopoMangiavacchi

I looked at the weight_norm formula ( w = g / ||v|| * v ) and reimplemented the op this way but still not the result I was expecting.

@register_torch_op
def _weight_norm(context, node):
    inputs = _get_inputs(context, node, expected=3)
    g = inputs[0]
    v = inputs[1]
    abs_v = mb.abs(x=v)
    inv_abs_v = mb.inverse(x=abs_v)
    gv = mb.mul(x=g, y=inv_abs_v)
    w = mb.mul(x=gv, y=v)

    context.add(w, node.name)

I wonder if 'g' and 'v' are really the input tensors I'm receiving here.

JacopoMangiavacchi avatar Dec 22 '21 00:12 JacopoMangiavacchi

I looked at the weight_norm formula ( w = g / ||v|| * v ) and reimplemented the op this way but still not the result I was expecting.

@register_torch_op
def _weight_norm(context, node):
    inputs = _get_inputs(context, node, expected=3)
    g = inputs[0]
    v = inputs[1]
    abs_v = mb.abs(x=v)
    inv_abs_v = mb.inverse(x=abs_v)
    gv = mb.mul(x=g, y=inv_abs_v)
    w = mb.mul(x=gv, y=v)

    context.add(w, node.name)

I wonder if 'g' and 'v' are really the input tensors I'm receiving here.

@JacopoMangiavacchi Any update?

chinsyo avatar Jan 04 '22 08:01 chinsyo

@chinsyo I'm still waiting for advice on how to correctly implement the _weight_norm op

JacopoMangiavacchi avatar Jan 04 '22 17:01 JacopoMangiavacchi

@chinsyo I'm still waiting for advice on how to correctly implement the _weight_norm op

https://github.com/pytorch/pytorch/blob/49a07c892265ed89ed8302db15af4647746f6543/torch/nn/utils/weight_norm.py?_pjax=%23js-repo-pjax-container%2C%20div%5Bitemtype%3D%22http%3A%2F%2Fschema.org%2FSoftwareSourceCode%22%5D%20main%2C%20%5Bdata-pjax-container%5D#L47

@JacopoMangiavacchi I looked into the source code and found that dim=0 by default will trigger the norm_except_dim function, which is not explained in the formula. I judge that this may be the reason. Please refer to, if it is helpful to you, hope to hear your good news.

chinsyo avatar Jan 08 '22 14:01 chinsyo

Thank you @chinsyo but I'm not sure I'm understanding your advice. Where do you see the check for dim == 0 will trigger the norm_except_dim call. I can't find the norm_except_dim call in you PyTorch source code you linked above.

JacopoMangiavacchi avatar Jan 08 '22 18:01 JacopoMangiavacchi

Thank you @chinsyo but I'm not sure I'm understanding your advice. Where do you see the check for dim == 0 will trigger the norm_except_dim call. I can't find the norm_except_dim call in you PyTorch source code you linked above.

@JacopoMangiavacchi Hope links below is helpful.

PyTorch OPSET

        # W = g * ((v) / ||v||)
        # Compute norm_except_dim for l2 norm. dim = None means over all dims
        # torch's weight_norm module sets dim = -1 if it's None.
        # This conflicts the logic for negative axes to access dims backwards
        # TODO: Might need a fix in torch group_norm module

WeightNorm Implementation

        # add g and v as new parameters and express w as g/||v|| * v
        module.register_parameter(name + '_g', Parameter(norm_except_dim(weight, 2, dim).data))

_weight_norm & norm_except_dim c++ source code

// Staying faithful to the Python for now for clarity, look for optimizations later
// (e.g., single return statement for RVO)
Tensor norm_except_dim(const Tensor & v, int64_t pow, int64_t dim)
{
  // I assume tensor.contiguous(), view(), norm(), etc. here will dispatch through VariableType.
  if (dim == -1) {
    return v.norm(pow);
  } else if (dim == 0) {
    std::vector<int64_t> output_size(v.dim(), 1);
    output_size[0] = v.size(0);
    return v.contiguous().view({v.size(0), -1}).norm(pow, 1).view(output_size);
  } else if (dim == v.dim() - 1) {
    std::vector<int64_t> output_size(v.dim(), 1);
    output_size[v.dim() - 1] = v.size(v.dim() - 1);
    return v.contiguous().view({-1, v.size(v.dim() - 1)}).norm(pow, 0).view(output_size);
  } else {
    // To consider: at::native::norm_except_dim is probably fine as well,
    // and would avoid an additional dynamic dispatch.
    return at::norm_except_dim(v.transpose(0, dim), pow, 0).transpose(0, dim); // optimize?
  }
}

Tensor _weight_norm
  (const Tensor & v_in,
   const Tensor & g_in,
   int64_t dim)
{

  TORCH_CHECK(
    v_in.device() == g_in.device(),
    "weight_norm: expected v_in and g_in to be on the same device, but v_in is "
    "on ", v_in.device(), " and g_in is on ", g_in.device());

  auto v = v_in.contiguous();
  auto g = g_in.contiguous();

  bool can_use_fused = v.is_cuda() && (dim == 0 || dim == v.dim() - 1);

  if (can_use_fused) {
    // weight_norm does not have a derivative defined for it, so this will route back through
    // VariableType.cpp, and construct a WeightNormFusedBackward object in the autograd graph.
    return std::get<0>(at::_weight_norm_cuda_interface(v, g, dim));
  } else {
    // Double-differentiable primitive ops
    // at::native::norm_except_dim would probably be fine as well.
    return v*(g/at::norm_except_dim(v, 2, dim));
  }
}

chinsyo avatar Jan 09 '22 12:01 chinsyo

any update ? 🥺

amyaots avatar Jan 24 '22 20:01 amyaots

any update ? 🥺🥺🥺

happy-jihye avatar Sep 21 '22 05:09 happy-jihye