rf-detr Problem with segmentation mask prediction in RFDETRSegPreview model

trafficstars

Search before asking

[x] I have searched the RF-DETR issues and found no similar bug report.

Bug

When using the RFDETRSegPreview model, the predictions do not include segmentation masks. The condition len(predictions) == 3 is never satisfied, indicating that the model only outputs detection results without the expected mask data.

Even though RFDETRSegPreview is supposed to support segmentation, it behaves as if segmentation heads are not active or returned by the model.

Environment

⚙️ Environment:

OS: Windows

Python: 3.12

PyTorch: 2.8

Model: RFDETRSegPreview

Minimal Reproducible Example

`import io import requests import supervision as sv from PIL import Image from rfdetr import RFDETRBase, RFDETRNano, RFDETRSmall, RFDETRMedium, RFDETRSegPreview from rfdetr.util.coco_classes import COCO_CLASSES

model = RFDETRSegPreview()

model.optimize_for_inference()

url = "https://media.roboflow.com/notebooks/examples/dog-2.jpeg"

image = Image.open(io.BytesIO(requests.get(url).content)) detections = model.predict(image, threshold=0.5)

labels = [ f"{COCO_CLASSES[class_id]} {confidence:.2f}" for class_id, confidence in zip(detections.class_id, detections.confidence) ] print(detections) annotated_image = image.copy() annotated_image = sv.BoxAnnotator().annotate(annotated_image, detections) annotated_image = sv.LabelAnnotator().annotate(annotated_image, detections, labels) annotated_image = sv.MaskAnnotator().annotate(annotated_image, detections, labels)

sv.plot_image(annotated_image)`

Additional

No response

Are you willing to submit a PR?

[x] Yes, I'd like to help by submitting a PR!

Oct 09 '25 08:10 jbdebelle

Fix: annotated_image = sv.MaskAnnotator().annotate(annotated_image, detections) Works for me.
WSL, Ubuntu 22.04.
Python 3.10 CPU only.

Oct 09 '25 13:10 Abdul-Mukit

I'm having the same issue. Printing detections.mask gives me a tensor of False values, although the BB's are detected properly.

OS: Windows

Python: 3.10.18

PyTorch: 2.8.0+cu126

During download of segmentation preview checkpoint I get:

Using a different number of positional encodings than DINOv2, which means we're not loading DINOv2 backbone weights. This is not a problem if finetuning a pretrained RF-DETR model.
Using patch size 12 instead of 14, which means we're not loading DINOv2 backbone weights. This is not a problem if finetuning a pretrained RF-DETR model.
Loading pretrain weights
`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`.

Is it possible that the seg-preview checkpoint hosted is not the latest version? I saw there was a bug in #b58f3b1 related to this exact issue of including masks in the prediction results.

@Abdul-Mukit could you confirm for me that you used the inference package and thus the seg-preview model that is hosted on Roboflow's website? I.e. the get_model('rf-detr-seg-preview') approach.

Oct 13 '25 15:10 beyondmarti

@beyondmarti No, I didn't use the inference package. I used the predict call as shown in the original post.
Also, please note that I ran in WSL, now in Windows.

Oct 18 '25 16:10 Abdul-Mukit

Had the same issue on Ubuntu 20, modified line 318 in detr.py

From:

if isinstance(predictions, tuple): predictions = { "pred_logits": predictions[1], "pred_boxes": predictions[0], } if len(predictions) == 3: predictions["pred_masks"] = predictions[2]

To:

if isinstance(predictions, tuple): if len(predictions) == 3: predictions = { "pred_logits": predictions[1], "pred_boxes": predictions[0], "pred_masks": predictions[2]} else: predictions = { "pred_logits": predictions[1], "pred_boxes": predictions[0], }

Oct 21 '25 04:10 LachlanMares

If I remove model.optimize_for_inference(), the segmentation masks appear.

Oct 21 '25 06:10 HyeminJeong-acloset

Not sure what exactly fixed it for me, but I now get Masks. @LachlanMares solution did not do anything for me as I realized the given predictions argument was a Dict already containing Masks, the code above is only for cases where it for some reason is passed as a Tuple.

Oct 21 '25 09:10 beyondmarti

I'm having the same issue. Printing detections.mask gives me a tensor of False values, although the BB's are detected properly.

OS: Windows

Python: 3.10.18

PyTorch: 2.8.0+cu126

During download of segmentation preview checkpoint I get:
Using a different number of positional encodings than DINOv2, which means we're not loading DINOv2 backbone weights. This is not a problem if finetuning a pretrained RF-DETR model.
Using patch size 12 instead of 14, which means we're not loading DINOv2 backbone weights. This is not a problem if finetuning a pretrained RF-DETR model.
Loading pretrain weights
`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`.
Is it possible that the seg-preview checkpoint hosted is not the latest version? I saw there was a bug in #b58f3b1 related to this exact issue of including masks in the prediction results.

@Abdul-Mukit could you confirm for me that you used the inference package and thus the seg-preview model that is hosted on Roboflow's website? I.e. the get_model('rf-detr-seg-preview') approach.

Please have you been able to solve this?

I have a tensor of false values when i get the mask; output.mask making it impossible for me to get the xyxy polygons which i need. Here: results = model.predict( images = Image.open(os.path.join(evaluation_images_path, os.listdir(evaluation_images_path)[0])), threshold = .4 )

results[0].mask returns a tensor that has only False values

Oct 23 '25 09:10 AgbajeAyomipo

@AgbajeAyomipo Have you tried annotating the frame with annotated_image = sv.MaskAnnotator().annotate(annotated_image, detections)? There should be False values, those correspond to the pixels of the image that are not included in the mask, in other words the majority.

Oct 23 '25 09:10 beyondmarti

@AgbajeAyomipo Have you tried annotating the frame with annotated_image = sv.MaskAnnotator().annotate(annotated_image, detections)? There should be False values, those correspond to the pixels of the image that are not included in the mask, in other words the majority.

@beyondmarti thanks for the reply. However i don't think i understand you totally. Here's what i mean, an output mask of the shape (512,512) only has False values which is 0's throughout, this basically means the polygon is [], empty basically. But the problem is that there's a confidence score for this detection.

So what am i to do.

Just for context, there's about 10 detections in the result(image), so how do i even put it together. I'm not new to segmentation or detection, it's just weird that there's a detection but no coordinates to show for it

Oct 23 '25 11:10 AgbajeAyomipo

@AgbajeAyomipo you would have to show more of your code and your set-up to get a clearer picture. But of course if all of the values in the output mask is False, then there is something wrong.

Oct 23 '25 11:10 beyondmarti

@AgbajeAyomipo you would have to show more of your code and your set-up to get a clearer picture. But of course if all of the values in the output mask is False, then there is something wrong.

@beyondmarti here:

model = RFDETRSegPreview(pretrain_weights = "/kaggle/input/rfdetr-output/fold_1/checkpoint_best_ema.pth") results = model.predict( images = Image.open(os.path.join(evaluation_images_path, os.listdir(evaluation_images_path)[0])), threshold = .35 )

[print(num, np.unique(i)) for num, i in enumerate(results.mask)]

This is the output: 0 [False True] 1 [False True] 2 [False True] 3 [False True] 4 [False True] 5 [False True] 6 [False True] 7 [False True] 8 [False True] 9 [False True] 10 [False True] 11 [False True] 12 [False True] 13 [False True] 14 [False True] 15 [False True] 16 [False True] 17 [False True] 18 [False True] 19 [False True] 20 [False True] 21 [False True] 22 [False True] 23 [False True] 24 [False True] 25 [False True] 26 [False True] 27 [False True] 28 [False True] 29 [False True] 30 [False True] 31 [False True] 32 [False True] 33 [False True] 34 [False True] 35 [False True] 36 [False True] 37 [False True] 38 [False] 39 [False True] 40 [False True]

The 38th index has its output mask as only zeros This is weird given the fact that there is a confidence score

Oct 23 '25 12:10 AgbajeAyomipo

@AgbajeAyomipo then I reiterate my previous question; did you try to annotate the masks? Because you clearly do get masks, just that it seems like for one of your images it was not able to produce a mask.

If you are asking me specifically about that one image why it wasn't able, then I wouldn't know. Have not researched how RF-DETR's segmentation head decoder works under the hood (yet).

Oct 23 '25 12:10 beyondmarti

@AgbajeAyomipo then I reiterate my previous question; did you try to annotate the masks? Because you clearly do get masks, just that it seems like for one of your images it was not able to produce a mask.

If you are asking me specifically about that one image why it wasn't able, then I wouldn't know. Have not researched how RF-DETR's segmentation head decoder works under the hood (yet).

@beyondmarti Thanks for your output, i visualized the mask, it is an empty image(all zeros), and this isn't different images, it is one image and about 40 separate detections, i wanted to extract individual polygon coordinates per detected object, I hope you understand. Except if i am the one making a mistake here?

Well, i just dropped the instances with empty masks.

Oct 23 '25 12:10 AgbajeAyomipo

@AgbajeAyomipo Ooooh, okay. My mistake. But still, that means 39 out of 40 objects had a mask detected, right? Again, not sure why the final detection does not result in a mask. Maybe one of the creators can explain this behaviour.

Oct 23 '25 12:10 beyondmarti

@AgbajeAyomipo Ooooh, okay. My mistake. But still, that means 39 out of 40 objects had a mask detected, right? Again, not sure why the final detection does not result in a mask. Maybe one of the creators can explain this behaviour.

Thanks, I'll work with what i have for now, at least 75% of detections have masks. Let me work with that for now.

Oct 23 '25 13:10 AgbajeAyomipo

The model generates per-pixel logits for the mask, then thresholds it. Probably for some reason that object just isn't being found. I'd visualize the box and see if you can guess why, impossible to say without looking at your data. But one thought is that your confidence threshold may be too low.

Oct 23 '25 15:10 isaacrob-roboflow

The model generates per-pixel logits for the mask, then thresholds it. Probably for some reason that object just isn't being found. I'd visualize the box and see if you can guess why, impossible to say without looking at your data. But one thought is that your confidence threshold may be too low.

Thank you. I suspected that it is due to the low confidence as well. I'll explore visualizations and post an update here if need be.

Oct 23 '25 15:10 AgbajeAyomipo

rf-detr rf-detr copied to clipboard

Problem with segmentation mask prediction in RFDETRSegPreview model

Search before asking

Bug

Environment

Minimal Reproducible Example

Additional

Are you willing to submit a PR?

rf-detr
rf-detr copied to clipboard