onnx2torch icon indicating copy to clipboard operation
onnx2torch copied to clipboard

Clip convert error: Dynamic value of min/max is not implemented

Open ongiaf opened this issue 1 year ago • 8 comments

Hi, I have an onnx model. Here is one of the nodes in onnxgraph

node {
  input: "/decoder.0/layers.0/blocks.0/attn/Pow_3_output_0"
  input: "/decoder.0/layers.0/blocks.0/attn/Constant_13_output_0"
  input: ""
  output: "/decoder.0/layers.0/blocks.0/attn/Clip_1_output_0"
  name: "/decoder.0/layers.0/blocks.0/attn/Clip_1"
  op_type: "Clip"
  doc_string: "...."
}

When I tried to convert it to the torch model, it cased a KeyError:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/root/micromamba/envs/ai-models/lib/python3.10/site-packages/onnx2torch/converter.py", line 110, in convert
    torch_module, onnx_mapping = converter(onnx_node, onnx_graph)
  File "/root/micromamba/envs/ai-models/lib/python3.10/site-packages/onnx2torch/node_converters/clip.py", line 60, in _
    raise NotImplementedError('Dynamic value of min/max is not implemented') from exc
NotImplementedError: Dynamic value of min/max is not implemented

it may caused by https://github.com/ENOT-AutoDL/onnx2torch/blob/a8b060336c8c95c51a6257a8d99171f0b86b8eab/onnx2torch/node_converters/clip.py#L60

After adding conditions

min_val = float(get_const_value(min_name, graph)) if (min_name is not None and min_name != '') else None
max_val = float(get_const_value(max_name, graph)) if (max_name is not None and max_name != '') else None

The convert can work.

ongiaf avatar Dec 15 '23 04:12 ongiaf

The full onnx model can be download from here:

  1. https://get.ecmwf.int/repository/test-data/ai-models/fuxi/short.onnx
  2. ONNX External Data: https://get.ecmwf.int/repository/test-data/ai-models/fuxi/short

ongiaf avatar Dec 15 '23 05:12 ongiaf

@ongiaf Do you have any success decoding FuXi (I've been fine-tuning this model for a long time)? I recommend paying attention to this solution

dsuhoi avatar Feb 06 '24 15:02 dsuhoi

Thanks, it's excellent work. And with some dirty work, Fuxi can successfully run on PyTorch with Onnx2Torch. In Onnx2Torch, problems are mainly about LayerNormalization and Clip.

ongiaf avatar Feb 07 '24 17:02 ongiaf

@ongiaf Did you manage to run FuXi with the current weights for the fine-tuning process (I am currently thinking about how to complete the work on the model on a 1-hour grid and thought about freezing some layers except the U-transformer.) ?

dsuhoi avatar Feb 07 '24 17:02 dsuhoi

Thanks, it's excellent work. And with some dirty work, Fuxi can successfully run on PyTorch with Onnx2Torch. In Onnx2Torch, problems are mainly about LayerNormalization and Clip.

Thank you for posting your changes about Clip. Could you also suggest how to fix LayerNormalization? Looks like the converted model has issue with torch.layer_norm call.

juanqiu1 avatar Feb 20 '24 15:02 juanqiu1

@juanqiu1 In order for this to work with FuXi, you will need to change the onnx2torch/node_converters/layer_norm.py parameter to [1536]:

@add_converter(operation_type='LayerNormalization', version=17)
def _(node: OnnxNode, graph: OnnxGraph) -> OperationConverterResult:
    node_attributes = node.attributes

    axis = node_attributes.get('axis', AXIS_DEFAULT_VALUE)
    epsilon = node_attributes.get('epsilon', EPSILON_DEFAULT_VALUE)

    if all(value_name in graph.initializers for value_name in node.input_values[1:]):
        input_value_info = graph.value_info[node.input_values[0]]
        input_shape = get_shape_from_value_info(input_value_info)
        torch_module = nn.LayerNorm(
            normalized_shape=(1536), # input_shape[axis:], (this block!)
            eps=epsilon,
            elementwise_affine=True,
        )

dsuhoi avatar Feb 20 '24 16:02 dsuhoi

@dsuhoi Thank you for hinting, there are a couple of other easy fixes (typing, etc). Did you manage to run FuXi with the current weights for the fine-tuning process Do you have any progress on that? After conversion, I loaded model into pytorch but even on A100 with FSDP enabled via accelerate. I still get CUDA out-of-memory error.

juanqiu1 avatar Feb 21 '24 13:02 juanqiu1

@juanqiu1 Yes, I managed to start the learning process by highlighting the named_parameters() part within the last dozen UTransformer (this was enough for fine-tunning).

I used Nvidia A100 (40GB).

dsuhoi avatar Feb 21 '24 13:02 dsuhoi