D-FINE icon indicating copy to clipboard operation
D-FINE copied to clipboard

Is it possible to export ONNX model with 320x320 resolution?

Open brainstormi opened this issue 7 months ago • 8 comments

Trying to run the model in an Intel iGPU, I'm trying to reduce de model to 320x320 resolution for real-time processing, but 640x640 is hardcoded in export_onnx library and if changed to 320 it fails to export it. Is there anyway to adapt the resolution of the model to 320x320? Regards,

brainstormi avatar May 09 '25 10:05 brainstormi

Hi,

It is possible, but you need to change the config files. Change the eval_spatial_size to [320,320] https://github.com/Peterande/D-FINE/blob/4a1f73a8bcfac736a88abde9596d87f116d780a7/configs/dfine/include/dfine_hgnetv2.yml#L8

The reason is that the D-FINE encoder used a fixed size of spatial embeddings, which is initialized here.
https://github.com/Peterande/D-FINE/blob/4a1f73a8bcfac736a88abde9596d87f116d780a7/src/zoo/dfine/dfine_decoder.py#L621-L622 https://github.com/Peterande/D-FINE/blob/4a1f73a8bcfac736a88abde9596d87f116d780a7/src/zoo/dfine/dfine_decoder.py#L731-L754

Now, noticed that D-FINE reported metrics are for an input size of 640x640

SebastianJanampa avatar May 10 '25 18:05 SebastianJanampa

I changed eval_spatial_size: from [640, 640] to [320 ,320]. But still there are errors.

senstar-hsoleimani avatar Jul 12 '25 21:07 senstar-hsoleimani

could you please provide the error message, please?

SebastianJanampa avatar Jul 13 '25 15:07 SebastianJanampa

@SebastianJanampa I trained the nano version with 640640 resolution. Now I want to export it with 320320 resolution. In /include/dfine_hgnetv2.ymlchanged, I changed the eval_spatial_size to [320, 320]. Also, in export_onnx, I changed

data = torch.rand(1, 3, 320, 320).to('cpu')
size = torch.tensor([[320, 320]])

The error is

D-FINE\dfine\lib\site-packages\torch\nn\modules\module.py", line 2584, in load_state_dict raise RuntimeError( RuntimeError: Error(s) in loading state_dict for DFINE: size mismatch for decoder.anchors: copying a param with shape torch.Size([1, 2000, 4]) from checkpoint, the shape in current model is torch.Size([1, 500, 4]). size mismatch for decoder.valid_mask: copying a param with shape torch.Size([1, 2000, 1]) from checkpoint, the shape in current model is torch.Size([1, 500, 1]).

Am I missing something here? Should I train the model with 320*320 and then export it with the same resolution?

senstar-hsoleimani avatar Jul 13 '25 22:07 senstar-hsoleimani

the problem is in these lines https://github.com/Peterande/D-FINE/blob/d6694750683b0c7e9f523ba6953d16f112a376ae/src/zoo/dfine/dfine_decoder.py#L615-L622

When you create and train D-FINE, you also create non-trainable parameters, which are self.anchors and self.valid_mask

The quick solution is not load those weights from the checkpoint file. You could do something like :

"""
D-FINE: Redefine Regression Task of DETRs as Fine-grained Distribution Refinement
Copyright (c) 2024 The D-FINE Authors. All Rights Reserved.
---------------------------------------------------------------------------------
Modified from RT-DETR (https://github.com/lyuwenyu/RT-DETR)
Copyright (c) 2023 lyuwenyu. All Rights Reserved.
"""

import os
import sys

sys.path.insert(0, os.path.join(os.path.dirname(os.path.abspath(__file__)), "../.."))

import torch
import torch.nn as nn

from src.core import YAMLConfig


def main(
    args,
):
    """main"""
    cfg = YAMLConfig(args.config, resume=args.resume)

    if "HGNetv2" in cfg.yaml_cfg:
        cfg.yaml_cfg["HGNetv2"]["pretrained"] = False

    if args.resume:
        checkpoint = torch.load(args.resume, map_location="cpu")
        if "ema" in checkpoint:
            state = checkpoint["ema"]["module"]
        else:
            state = checkpoint["model"]
        
        # We remove the anchors and valid_mask saved parameters here
        new_checkpoint = {}
        for k in status:
          if 'anchors' in k or 'valid_mask' in k:
            print(k)
            continue
          new_checkpoint[k] = status[k] 

        # NOTE load train mode state -> convert to deploy mode
        cfg.model.load_state_dict(new_checkpoint)

    else:
        # raise AttributeError('Only support resume to load model.state_dict by now.')
        print("not load model.state_dict, use default init state dict...")

    class Model(nn.Module):
        def __init__(
            self,
        ) -> None:
            super().__init__()
            self.model = cfg.model.deploy()
            self.postprocessor = cfg.postprocessor.deploy()

        def forward(self, images, orig_target_sizes):
            outputs = self.model(images)
            outputs = self.postprocessor(outputs, orig_target_sizes)
            return outputs

    model = Model()

    data = torch.rand(32, 3, 320, 320)
    size = torch.tensor([[320, 320]])
    _ = model(data, size)

    dynamic_axes = {
        "images": {
            0: "N",
        },
        "orig_target_sizes": {0: "N"},
    }

    output_file = args.resume.replace(".pth", ".onnx") if args.resume else "model.onnx"

    torch.onnx.export(
        model,
        (data, size),
        output_file,
        input_names=["images", "orig_target_sizes"],
        output_names=["labels", "boxes", "scores"],
        dynamic_axes=dynamic_axes,
        opset_version=16,
        verbose=False,
        do_constant_folding=True,
    )

    if args.check:
        import onnx

        onnx_model = onnx.load(output_file)
        onnx.checker.check_model(onnx_model)
        print("Check export onnx model done...")

    if args.simplify:
        import onnx
        import onnxsim

        dynamic = True
        # input_shapes = {'images': [1, 3, 320, 320], 'orig_target_sizes': [1, 2]} if dynamic else None
        input_shapes = {"images": data.shape, "orig_target_sizes": size.shape} if dynamic else None
        onnx_model_simplify, check = onnxsim.simplify(output_file, test_input_shapes=input_shapes)
        onnx.save(onnx_model_simplify, output_file)
        print(f"Simplify onnx model {check}...")


if __name__ == "__main__":
    import argparse

    parser = argparse.ArgumentParser()
    parser.add_argument(
        "--config",
        "-c",
        default="configs/dfine/dfine_hgnetv2_l_coco.yml",
        type=str,
    )
    parser.add_argument(
        "--resume",
        "-r",
        type=str,
    )
    parser.add_argument(
        "--check",
        action="store_true",
        default=True,
    )
    parser.add_argument(
        "--simplify",
        action="store_true",
        default=True,
    )
    args = parser.parse_args()
    main(args)

SebastianJanampa avatar Jul 13 '25 23:07 SebastianJanampa

It results in another error. It seems we cannot ignore this valid_mask and anchors; need to initiate them with dummy inputs.

raise RuntimeError(RuntimeError: Error(s) in loading state_dict for DFINE:         Missing key(s) in state_dict: "decoder.anchors", "decoder.valid_mask"

senstar-hsoleimani avatar Jul 14 '25 13:07 senstar-hsoleimani

my mistake. please, use this code:

cfg.model.load_state_dict(new_checkpoint, strict=False)

Let me know if it works

SebastianJanampa avatar Jul 14 '25 13:07 SebastianJanampa

Yes, worked this time. Thanks

senstar-hsoleimani avatar Jul 14 '25 13:07 senstar-hsoleimani