anomalib [Bug]: Allow extra kwargs in to_onnx() and/or engine.export() to be passed to torch.onnx.export

Describe the bug

When trying to export a model larger than 2GB, onnx throws the following error:

RuntimeError: The serialized model is larger than the 2GiB limit imposed by the protobuf library. Therefore the output file must be a file path, so that the ONNX external data can be written to the same directory. Please specify the output file name.
File <command-4284593085184757>, line 1
----> 1 engine.export(model=model, export_root=TMP, export_type=ExportType.ONNX, transform=transform)

where model is a Padim with image size 512x512, and a wide_resnet50_2 as a backbone with default parameters (layers and features). Note that TMP already is a path to a directory, and not a path to an .onnx file.

Large onnx models are exported with external data (that is, with multiple files, not everything is included in the .onnx file), as specified here. And, indeed, torch.onnx.export accepts a external_data argument. That argument is is not exposed in Anomalib, since the to_onnx() method defined in Pytorch Lightning , that accepts **kwargs passed to torch.onnx.export, is overridden by the Anomalib ExportMixin (used in the based AnomalyModule).

My guess is that, exposing **kwargs passed to torch.onnx.export should solve the issue and allow exporting large onnx models.

Dataset

Other (please specify in the text field below)

Model

PADiM

Steps to reproduce the behavior

Train a large enough Padim model on any dataset, for example:

image size 512x512
backbone: wide_resnet50_2 with default layers and features

Try to export the model to onnx with

engine.export(model=model, export_root=TMP, export_type=ExportType.ONNX, transform=transform)

where TMP is the path to a local directory.

OS information

OS information:

OS: Ubuntu 20.04
Python version: 3.10
Anomalib version: 1.2.0
PyTorch version: 2.5.1
CUDA/cuDNN version: 12.4
Any other relevant information: training on a custom dataset, but any dataset with image size 512x512 and the specified model parameters should be enough to reproduce the error.

Expected behavior

The methods engine.export() and/or model.to_onnx() should accept extra kwargs that will be passed to torch.onnx.export(), in order to simplify the export process of large models.

Screenshots

Pip/GitHub

pip

What version/branch did you use?

No response

Configuration YAML

No configuration used

Logs

RuntimeError: The serialized model is larger than the 2GiB limit imposed by the protobuf library. Therefore the output file must be a file path, so that the ONNX external data can be written to the same directory. Please specify the output file name.
File <command-4284593085184757>, line 1
----> 1 engine.export(model=model, export_root=TMP, export_type=ExportType.ONNX, transform=transform)
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-cecb19f8-6e0a-4bc0-ba3d-c636bf461e18/lib/python3.10/site-packages/anomalib/engine/engine.py:958, in Engine.export(self, model, export_type, export_root, input_size, transform, compression_type, datamodule, metric, ov_args, ckpt_path)
    952     exported_model_path = model.to_torch(
    953         export_root=export_root,
    954         transform=transform,
    955         task=self.task,
    956     )
    957 elif export_type == ExportType.ONNX:
--> 958     exported_model_path = model.to_onnx(
    959         export_root=export_root,
    960         input_size=input_size,
    961         transform=transform,
    962         task=self.task,
    963     )
    964 elif export_type == ExportType.OPENVINO:
    965     exported_model_path = model.to_openvino(
    966         export_root=export_root,
    967         input_size=input_size,
   (...)
    973         ov_args=ov_args,
    974     )
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-cecb19f8-6e0a-4bc0-ba3d-c636bf461e18/lib/python3.10/site-packages/anomalib/models/components/base/export_mixin.py:151, in ExportMixin.to_onnx(self, export_root, input_size, transform, task)
    149 _write_metadata_to_json(self._get_metadata(task), export_root)
    150 onnx_path = export_root / "model.onnx"
--> 151 torch.onnx.export(
    152     inference_model,
    153     input_shape.to(self.device),
    154     str(onnx_path),
    155     opset_version=14,
    156     dynamic_axes=dynamic_axes,
    157     input_names=["input"],
    158     output_names=["output"],
    159 )
    161 return onnx_path
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-cecb19f8-6e0a-4bc0-ba3d-c636bf461e18/lib/python3.10/site-packages/torch/onnx/__init__.py:375, in export(model, args, f, kwargs, export_params, verbose, input_names, output_names, opset_version, dynamic_axes, keep_initializers_as_inputs, dynamo, external_data, dynamic_shapes, report, verify, profile, dump_exported_program, artifacts_dir, fallback, training, operator_export_type, do_constant_folding, custom_opsets, export_modules_as_functions, autograd_inlining, **_)
    369 if dynamic_shapes:
    370     raise ValueError(
    371         "The exporter only supports dynamic shapes "
    372         "through parameter dynamic_axes when dynamo=False."
    373     )
--> 375 export(
    376     model,
    377     args,
    378     f,  # type: ignore[arg-type]
    379     kwargs=kwargs,
    380     export_params=export_params,
    381     verbose=verbose is True,
    382     input_names=input_names,
    383     output_names=output_names,
    384     opset_version=opset_version,
    385     dynamic_axes=dynamic_axes,
    386     keep_initializers_as_inputs=keep_initializers_as_inputs,
    387     training=training,
    388     operator_export_type=operator_export_type,
    389     do_constant_folding=do_constant_folding,
    390     custom_opsets=custom_opsets,
    391     export_modules_as_functions=export_modules_as_functions,
    392     autograd_inlining=autograd_inlining,
    393 )
    394 return None
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-cecb19f8-6e0a-4bc0-ba3d-c636bf461e18/lib/python3.10/site-packages/torch/onnx/utils.py:502, in export(model, args, f, kwargs, export_params, verbose, training, input_names, output_names, operator_export_type, opset_version, do_constant_folding, dynamic_axes, keep_initializers_as_inputs, custom_opsets, export_modules_as_functions, autograd_inlining)
    499 if kwargs is not None:
    500     args = args + (kwargs,)
--> 502 _export(
    503     model,
    504     args,
    505     f,
    506     export_params,
    507     verbose,
    508     training,
    509     input_names,
    510     output_names,
    511     operator_export_type=operator_export_type,
    512     opset_version=opset_version,
    513     do_constant_folding=do_constant_folding,
    514     dynamic_axes=dynamic_axes,
    515     keep_initializers_as_inputs=keep_initializers_as_inputs,
    516     custom_opsets=custom_opsets,
    517     export_modules_as_functions=export_modules_as_functions,
    518     autograd_inlining=autograd_inlining,
    519 )
    521 return None
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-cecb19f8-6e0a-4bc0-ba3d-c636bf461e18/lib/python3.10/site-packages/torch/onnx/utils.py:1564, in _export(model, args, f, export_params, verbose, training, input_names, output_names, operator_export_type, export_type, opset_version, do_constant_folding, dynamic_axes, keep_initializers_as_inputs, fixed_batch_size, custom_opsets, add_node_names, onnx_shape_inference, export_modules_as_functions, autograd_inlining)
   1561     dynamic_axes = {}
   1562 _validate_dynamic_axes(dynamic_axes, model, input_names, output_names)
-> 1564 graph, params_dict, torch_out = _model_to_graph(
   1565     model,
   1566     args,
   1567     verbose,
   1568     input_names,
   1569     output_names,
   1570     operator_export_type,
   1571     val_do_constant_folding,
   1572     fixed_batch_size=fixed_batch_size,
   1573     training=training,
   1574     dynamic_axes=dynamic_axes,
   1575 )
   1577 # TODO: Don't allocate a in-memory string for the protobuf
   1578 defer_weight_export = (
   1579     export_type is not _exporter_states.ExportTypes.PROTOBUF_FILE
   1580 )
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-cecb19f8-6e0a-4bc0-ba3d-c636bf461e18/lib/python3.10/site-packages/torch/onnx/utils.py:1117, in _model_to_graph(model, args, verbose, input_names, output_names, operator_export_type, do_constant_folding, _disable_torch_constant_prop, fixed_batch_size, training, dynamic_axes)
   1114 params_dict = _get_named_param_dict(graph, params)
   1116 try:
-> 1117     graph = _optimize_graph(
   1118         graph,
   1119         operator_export_type,
   1120         _disable_torch_constant_prop=_disable_torch_constant_prop,
   1121         fixed_batch_size=fixed_batch_size,
   1122         params_dict=params_dict,
   1123         dynamic_axes=dynamic_axes,
   1124         input_names=input_names,
   1125         module=module,
   1126     )
   1127 except Exception as e:
   1128     torch.onnx.log("Torch IR graph at exception: ", graph)
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-cecb19f8-6e0a-4bc0-ba3d-c636bf461e18/lib/python3.10/site-packages/torch/onnx/utils.py:663, in _optimize_graph(graph, operator_export_type, _disable_torch_constant_prop, fixed_batch_size, params_dict, dynamic_axes, input_names, module)
    661 _C._jit_pass_lint(graph)
    662 if GLOBALS.onnx_shape_inference:
--> 663     _C._jit_pass_onnx_graph_shape_type_inference(
    664         graph, params_dict, GLOBALS.export_onnx_opset_version
    665     )
    667 return graph

Code of Conduct

[X] I agree to follow this project's Code of Conduct

Nov 12 '24 10:11 mbignotti

Hi, I am also encountering this exact issue, do you know a workaround for engine.export or another way?

Jan 06 '25 18:01 leemorton

Hi @leemorton! Unfortunately, I'm not sure there's a workaround for this. When exporting through the anomalib engine, internally the to_onnx method of the model is being called. The obvious solution would be to call this method directly, instead of calling the engine.export method, but it's not possible anyways. The problem is that the class defined here, that is a father class of all of the anomalib models, is overriding the to_onnx() method of pytorch lightning (all anomalib models are pytorch lightning models as well). But the overridden method does not expose extra kwargs, as opposed to what pytorch ligthning does. In theory, you could also call the onnx.export() method of the underlying torch.nn.Module, that is the lowest level of the hierarchy : engine -> anomalib model -> pytorch lightning model -> torch.nn.Module. But I haven't tried, and I'm not sure if the exported model would be the same in this case.

Jan 07 '25 08:01 mbignotti