rf-detr icon indicating copy to clipboard operation
rf-detr copied to clipboard

During RFDETRSegPreview ONNX export call, why does the desired shape has to be divisible by both 24 and 14?

Open Abdul-Mukit opened this issue 1 month ago • 2 comments
trafficstars

I was trying to export RFDETRSegPreview like the following:

model = RFDETRSegPreview(pretrain_weights=output_dir + "/checkpoint_best_ema.pth", device="cpu")
export_image_shape = (560, 560)
model.export(
    output_dir=output_dir,
    verbose=True,
    shape=export_image_shape,
)

I get the complaint:

File ~/projects/rf-detr/rfdetr/main.py:551, in Model.export(self, output_dir, infer_dir, simplify, backbone_only, opset_version, verbose, force, shape, batch_size, **kwargs)
    549     print(f"PyTorch inference output shape: {features.shape}")
    550 elif self.args.segmentation_head:
--> [551](https://file+.vscode-resource.vscode-cdn.net/home/mujin/projects/rf-detr/~/projects/rf-detr/rfdetr/main.py:551)     outputs = model(input_tensors)
    552     dets = outputs['pred_boxes']
...
--> [187](https://file+.vscode-resource.vscode-cdn.net/home/mujin/projects/rf-detr/~/projects/rf-detr/rfdetr/models/backbone/dinov2.py:187)     assert x.shape[2] % block_size == 0 and x.shape[3] % block_size == 0, f"Backbone requires input shape to be divisible by {block_size}, but got {x.shape}"
    188     x = self.encoder(x)
    189     return list(x[0])

AssertionError: Backbone requires input shape to be divisible by 24, but got torch.Size([1, 3, 560, 560])

If I set export_image_shape=(432, 432) - which is what is used if shape=None passed to export anyway - I get the following complaint:

File ~/projects/rf-detr/rfdetr/main.py:539, in Model.export(self, output_dir, infer_dir, simplify, backbone_only, opset_version, verbose, force, shape, batch_size, **kwargs)
    537 else:
    538     if shape[0] % 14 != 0 or shape[1] % 14 != 0:
--> [539](https://file+.vscode-resource.vscode-cdn.net/home/mujin/projects/rf-detr/~/projects/rf-detr/rfdetr/main.py:539)         raise ValueError("Shape must be divisible by 14")
    541 input_tensors = make_infer_image(infer_dir, shape, batch_size, device).to(device)
    542 input_names = ['input']

I was trying to export the model with a higher resolution. I have trained and predicted using the segmentation model. With an image resolution of 432, the output masks are too grainy for some of our robotics applications.

Abdul-Mukit avatar Oct 07 '25 20:10 Abdul-Mukit

That's a bug. Don't set shape, just set resolution. Should only need to be divisible by 24

isaacrob-roboflow avatar Oct 08 '25 02:10 isaacrob-roboflow

Thanks, this worked. But found more bugs.

Mask output info is being printed as 4245. That is because "masks" was not added to output_names in https://github.com/roboflow/rf-detr/blob/9fd97893c221a8335a319e4d59f830ff8cfb1716/rfdetr/main.py#L543

The initial mention of 2600 detections was also quite misleading. We call self.model.eval() instead of calling model.eval() before starting to export the model.

SegModel = RFDETRSegPreview

export_resolution = 576
model = SegModel(
    resolution=export_resolution,
    pretrain_weights=pretrain_weights_path,
    device="cpu"
)

export_dir = os.path.join(output_dir, "export")
model.export(
    output_dir=export_dir,
    verbose=False,
)

session = ort.InferenceSession(os.path.join(export_dir, "inference_model.onnx"))

inputs = session.get_inputs()
print("ONNX input details: ")
for input in inputs:
    print(input.name, input.shape, input.type)

print("ONNX output details: ")
outputs = session.get_outputs()
for output in outputs:
    print(output.name, output.shape, output.type)

Output:

PyTorch inference output shapes - Boxes: torch.Size([1, 2600, 4]), Labels: torch.Size([1, 2600, 5]), Masks: torch.Size([1, 2600, 144, 144])

Successfully exported ONNX model: /home/projects/rf-detr/output/RFDETRSegPreview/export/inference_model.onnx
ONNX export completed successfully
ONNX input details: 
input [1, 3, 576, 576] tensor(float)
ONNX output details: 
dets [1, 200, 4] tensor(float)
labels [1, 200, 5] tensor(float)
4245 [1, 'Addlabels_dim_1', 144, 144] tensor(float)

Abdul-Mukit avatar Oct 08 '25 20:10 Abdul-Mukit