rf-detr
rf-detr copied to clipboard
During RFDETRSegPreview ONNX export call, why does the desired shape has to be divisible by both 24 and 14?
I was trying to export RFDETRSegPreview like the following:
model = RFDETRSegPreview(pretrain_weights=output_dir + "/checkpoint_best_ema.pth", device="cpu")
export_image_shape = (560, 560)
model.export(
output_dir=output_dir,
verbose=True,
shape=export_image_shape,
)
I get the complaint:
File ~/projects/rf-detr/rfdetr/main.py:551, in Model.export(self, output_dir, infer_dir, simplify, backbone_only, opset_version, verbose, force, shape, batch_size, **kwargs)
549 print(f"PyTorch inference output shape: {features.shape}")
550 elif self.args.segmentation_head:
--> [551](https://file+.vscode-resource.vscode-cdn.net/home/mujin/projects/rf-detr/~/projects/rf-detr/rfdetr/main.py:551) outputs = model(input_tensors)
552 dets = outputs['pred_boxes']
...
--> [187](https://file+.vscode-resource.vscode-cdn.net/home/mujin/projects/rf-detr/~/projects/rf-detr/rfdetr/models/backbone/dinov2.py:187) assert x.shape[2] % block_size == 0 and x.shape[3] % block_size == 0, f"Backbone requires input shape to be divisible by {block_size}, but got {x.shape}"
188 x = self.encoder(x)
189 return list(x[0])
AssertionError: Backbone requires input shape to be divisible by 24, but got torch.Size([1, 3, 560, 560])
If I set export_image_shape=(432, 432) - which is what is used if shape=None passed to export anyway - I get the following complaint:
File ~/projects/rf-detr/rfdetr/main.py:539, in Model.export(self, output_dir, infer_dir, simplify, backbone_only, opset_version, verbose, force, shape, batch_size, **kwargs)
537 else:
538 if shape[0] % 14 != 0 or shape[1] % 14 != 0:
--> [539](https://file+.vscode-resource.vscode-cdn.net/home/mujin/projects/rf-detr/~/projects/rf-detr/rfdetr/main.py:539) raise ValueError("Shape must be divisible by 14")
541 input_tensors = make_infer_image(infer_dir, shape, batch_size, device).to(device)
542 input_names = ['input']
I was trying to export the model with a higher resolution. I have trained and predicted using the segmentation model. With an image resolution of 432, the output masks are too grainy for some of our robotics applications.
That's a bug. Don't set shape, just set resolution. Should only need to be divisible by 24
Thanks, this worked. But found more bugs.
Mask output info is being printed as 4245. That is because "masks" was not added to output_names in https://github.com/roboflow/rf-detr/blob/9fd97893c221a8335a319e4d59f830ff8cfb1716/rfdetr/main.py#L543
The initial mention of 2600 detections was also quite misleading. We call self.model.eval() instead of calling model.eval() before starting to export the model.
SegModel = RFDETRSegPreview
export_resolution = 576
model = SegModel(
resolution=export_resolution,
pretrain_weights=pretrain_weights_path,
device="cpu"
)
export_dir = os.path.join(output_dir, "export")
model.export(
output_dir=export_dir,
verbose=False,
)
session = ort.InferenceSession(os.path.join(export_dir, "inference_model.onnx"))
inputs = session.get_inputs()
print("ONNX input details: ")
for input in inputs:
print(input.name, input.shape, input.type)
print("ONNX output details: ")
outputs = session.get_outputs()
for output in outputs:
print(output.name, output.shape, output.type)
Output:
PyTorch inference output shapes - Boxes: torch.Size([1, 2600, 4]), Labels: torch.Size([1, 2600, 5]), Masks: torch.Size([1, 2600, 144, 144])
Successfully exported ONNX model: /home/projects/rf-detr/output/RFDETRSegPreview/export/inference_model.onnx
ONNX export completed successfully
ONNX input details:
input [1, 3, 576, 576] tensor(float)
ONNX output details:
dets [1, 200, 4] tensor(float)
labels [1, 200, 5] tensor(float)
4245 [1, 'Addlabels_dim_1', 144, 144] tensor(float)