rf-detr
rf-detr copied to clipboard
Problem with segmentation mask prediction in RFDETRSegPreview model
Search before asking
- [x] I have searched the RF-DETR issues and found no similar bug report.
Bug
When using the RFDETRSegPreview model, the predictions do not include segmentation masks. The condition len(predictions) == 3 is never satisfied, indicating that the model only outputs detection results without the expected mask data.
Even though RFDETRSegPreview is supposed to support segmentation, it behaves as if segmentation heads are not active or returned by the model.
Environment
⚙️ Environment:
OS: Windows
Python: 3.12
PyTorch: 2.8
Model: RFDETRSegPreview
Minimal Reproducible Example
`import io import requests import supervision as sv from PIL import Image from rfdetr import RFDETRBase, RFDETRNano, RFDETRSmall, RFDETRMedium, RFDETRSegPreview from rfdetr.util.coco_classes import COCO_CLASSES
model = RFDETRSegPreview()
model.optimize_for_inference()
url = "https://media.roboflow.com/notebooks/examples/dog-2.jpeg"
image = Image.open(io.BytesIO(requests.get(url).content)) detections = model.predict(image, threshold=0.5)
labels = [ f"{COCO_CLASSES[class_id]} {confidence:.2f}" for class_id, confidence in zip(detections.class_id, detections.confidence) ] print(detections) annotated_image = image.copy() annotated_image = sv.BoxAnnotator().annotate(annotated_image, detections) annotated_image = sv.LabelAnnotator().annotate(annotated_image, detections, labels) annotated_image = sv.MaskAnnotator().annotate(annotated_image, detections, labels)
sv.plot_image(annotated_image)`
Additional
No response
Are you willing to submit a PR?
- [x] Yes, I'd like to help by submitting a PR!
Fix: annotated_image = sv.MaskAnnotator().annotate(annotated_image, detections)
Works for me.
WSL, Ubuntu 22.04.
Python 3.10
CPU only.
I'm having the same issue. Printing detections.mask gives me a tensor of False values, although the BB's are detected properly.
OS: Windows
Python: 3.10.18
PyTorch: 2.8.0+cu126
During download of segmentation preview checkpoint I get:
Using a different number of positional encodings than DINOv2, which means we're not loading DINOv2 backbone weights. This is not a problem if finetuning a pretrained RF-DETR model.
Using patch size 12 instead of 14, which means we're not loading DINOv2 backbone weights. This is not a problem if finetuning a pretrained RF-DETR model.
Loading pretrain weights
`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`.
Is it possible that the seg-preview checkpoint hosted is not the latest version? I saw there was a bug in #b58f3b1 related to this exact issue of including masks in the prediction results.
@Abdul-Mukit could you confirm for me that you used the inference package and thus the seg-preview model that is hosted on Roboflow's website? I.e. the get_model('rf-detr-seg-preview') approach.
@beyondmarti No, I didn't use the inference package.
I used the predict call as shown in the original post.
Also, please note that I ran in WSL, now in Windows.
Had the same issue on Ubuntu 20, modified line 318 in detr.py
From:
if isinstance(predictions, tuple): predictions = { "pred_logits": predictions[1], "pred_boxes": predictions[0], } if len(predictions) == 3: predictions["pred_masks"] = predictions[2]
To:
if isinstance(predictions, tuple): if len(predictions) == 3: predictions = { "pred_logits": predictions[1], "pred_boxes": predictions[0], "pred_masks": predictions[2]} else: predictions = { "pred_logits": predictions[1], "pred_boxes": predictions[0], }
If I remove model.optimize_for_inference(), the segmentation masks appear.
Not sure what exactly fixed it for me, but I now get Masks. @LachlanMares solution did not do anything for me as I realized the given predictions argument was a Dict already containing Masks, the code above is only for cases where it for some reason is passed as a Tuple.
I'm having the same issue. Printing
detections.maskgives me a tensor of False values, although the BB's are detected properly.OS: Windows
Python: 3.10.18
PyTorch: 2.8.0+cu126
During download of segmentation preview checkpoint I get:
Using a different number of positional encodings than DINOv2, which means we're not loading DINOv2 backbone weights. This is not a problem if finetuning a pretrained RF-DETR model. Using patch size 12 instead of 14, which means we're not loading DINOv2 backbone weights. This is not a problem if finetuning a pretrained RF-DETR model. Loading pretrain weights `loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`.Is it possible that the seg-preview checkpoint hosted is not the latest version? I saw there was a bug in #b58f3b1 related to this exact issue of including masks in the prediction results.
@Abdul-Mukit could you confirm for me that you used the inference package and thus the seg-preview model that is hosted on Roboflow's website? I.e. the
get_model('rf-detr-seg-preview')approach.
Please have you been able to solve this?
I have a tensor of false values when i get the mask; output.mask making it impossible for me to get the xyxy polygons which i need.
Here:
results = model.predict( images = Image.open(os.path.join(evaluation_images_path, os.listdir(evaluation_images_path)[0])), threshold = .4 )
results[0].mask returns a tensor that has only False values
@AgbajeAyomipo Have you tried annotating the frame with annotated_image = sv.MaskAnnotator().annotate(annotated_image, detections)? There should be False values, those correspond to the pixels of the image that are not included in the mask, in other words the majority.
@AgbajeAyomipo Have you tried annotating the frame with
annotated_image = sv.MaskAnnotator().annotate(annotated_image, detections)? There should be False values, those correspond to the pixels of the image that are not included in the mask, in other words the majority.
@beyondmarti thanks for the reply. However i don't think i understand you totally. Here's what i mean, an output mask of the shape (512,512) only has False values which is 0's throughout, this basically means the polygon is [], empty basically. But the problem is that there's a confidence score for this detection.
So what am i to do.
Just for context, there's about 10 detections in the result(image), so how do i even put it together. I'm not new to segmentation or detection, it's just weird that there's a detection but no coordinates to show for it
@AgbajeAyomipo you would have to show more of your code and your set-up to get a clearer picture. But of course if all of the values in the output mask is False, then there is something wrong.
@AgbajeAyomipo you would have to show more of your code and your set-up to get a clearer picture. But of course if all of the values in the output mask is False, then there is something wrong.
@beyondmarti here:
model = RFDETRSegPreview(pretrain_weights = "/kaggle/input/rfdetr-output/fold_1/checkpoint_best_ema.pth") results = model.predict( images = Image.open(os.path.join(evaluation_images_path, os.listdir(evaluation_images_path)[0])), threshold = .35 )
[print(num, np.unique(i)) for num, i in enumerate(results.mask)]
This is the output: 0 [False True] 1 [False True] 2 [False True] 3 [False True] 4 [False True] 5 [False True] 6 [False True] 7 [False True] 8 [False True] 9 [False True] 10 [False True] 11 [False True] 12 [False True] 13 [False True] 14 [False True] 15 [False True] 16 [False True] 17 [False True] 18 [False True] 19 [False True] 20 [False True] 21 [False True] 22 [False True] 23 [False True] 24 [False True] 25 [False True] 26 [False True] 27 [False True] 28 [False True] 29 [False True] 30 [False True] 31 [False True] 32 [False True] 33 [False True] 34 [False True] 35 [False True] 36 [False True] 37 [False True] 38 [False] 39 [False True] 40 [False True]
The 38th index has its output mask as only zeros This is weird given the fact that there is a confidence score
@AgbajeAyomipo then I reiterate my previous question; did you try to annotate the masks? Because you clearly do get masks, just that it seems like for one of your images it was not able to produce a mask.
If you are asking me specifically about that one image why it wasn't able, then I wouldn't know. Have not researched how RF-DETR's segmentation head decoder works under the hood (yet).
@AgbajeAyomipo then I reiterate my previous question; did you try to annotate the masks? Because you clearly do get masks, just that it seems like for one of your images it was not able to produce a mask.
If you are asking me specifically about that one image why it wasn't able, then I wouldn't know. Have not researched how RF-DETR's segmentation head decoder works under the hood (yet).
@beyondmarti Thanks for your output, i visualized the mask, it is an empty image(all zeros), and this isn't different images, it is one image and about 40 separate detections, i wanted to extract individual polygon coordinates per detected object, I hope you understand. Except if i am the one making a mistake here?
Well, i just dropped the instances with empty masks.
@AgbajeAyomipo Ooooh, okay. My mistake. But still, that means 39 out of 40 objects had a mask detected, right? Again, not sure why the final detection does not result in a mask. Maybe one of the creators can explain this behaviour.
@AgbajeAyomipo Ooooh, okay. My mistake. But still, that means 39 out of 40 objects had a mask detected, right? Again, not sure why the final detection does not result in a mask. Maybe one of the creators can explain this behaviour.
Thanks, I'll work with what i have for now, at least 75% of detections have masks. Let me work with that for now.
The model generates per-pixel logits for the mask, then thresholds it. Probably for some reason that object just isn't being found. I'd visualize the box and see if you can guess why, impossible to say without looking at your data. But one thought is that your confidence threshold may be too low.
The model generates per-pixel logits for the mask, then thresholds it. Probably for some reason that object just isn't being found. I'd visualize the box and see if you can guess why, impossible to say without looking at your data. But one thought is that your confidence threshold may be too low.
Thank you. I suspected that it is due to the low confidence as well. I'll explore visualizations and post an update here if need be.