ultralytics YOLOE visual prompt with mask causes shape error

Search before asking

[x] I have searched the Ultralytics YOLO issues and found no similar bug report.

Ultralytics YOLO Component

Predict

Bug

I am experimenting with YOLOE visual prompts and hit a problem with using masks as prompt input.

My code looks more or less like this:

model = YOLOE("yoloe-11s-seg.pt")
pred  = YOLOEVPSegPredictor()
pred.setup_model(model.model)

prompts = dict(
        masks=masks, # masks is a numpy float32 array of shape (1, H, W)
        cls=cls # numpy array [1]
 )
pred.set_prompts(prompts)
vpe = pred.get_vpe(img_path) # <-- here it fails

.../predict.py:73, in YOLOEVPDetectPredictor.pre_transform(self, im)
     71 category = self.prompts["cls"]
     72 if len(img) == 1:
...
   4571         )
   4572     if not torch.jit.is_scripting():
   4573         if not all(_is_integer(x) for x in size):

ValueError: Input and output must have the same number of spatial dimensions, but got input with spatial dimensions of [640, 640, 1] and output size of (80, 80). Please provide input tensor in (N, C, d1, d2, ...,dK) format and output size in (o1, o2, ...,oK) format.

Suggested Fix

I traced it back to this line https://github.com/ultralytics/ultralytics/blob/fa5d0bffbd63adfc302e9c118ffe5caa6447a044/ultralytics/models/yolo/yoloe/predict.py#L125 where the actual shape of the masks is (1,H,W,1) instead. Looks like the BasePredictor.pre_transform (previous line) added a color dimension where it is not wanted.

replacing this line with

 masks = np.squeeze(np.stack(resized_masks), axis=-1)

fixes the problem (for me)

Environment

Ultralytics 8.3.208 Python-3.10.9 torch-2.7.1+cpu CPU (11th Gen Intel Core i7-11700 @ 2.50GHz) Setup complete (16 CPUs, 63.7 GB RAM, 708.4/953.0 GB disk)

OS Windows-10-10.0.26100-SP0 Environment Windows Python 3.10.9 Install pip Path [../yoloe-tests/.venv/Lib/site-packages/ultralytics) RAM 63.73 GB Disk 708.4/953.0 GB CPU 11th Gen Intel Core i7-11700 @ 2.50GHz CPU count 16 GPU None GPU count None CUDA None

numpy 2.2.6>=1.23.0 matplotlib 3.10.7>=3.3.0 opencv-python 4.12.0.88>=4.6.0 pillow 11.3.0>=7.1.2 pyyaml 6.0.3>=5.3.1 requests 2.32.5>=2.23.0 scipy 1.15.3>=1.4.1 torch 2.7.1>=1.8.0 torch 2.7.1!=2.4.0,>=1.8.0; sys_platform == "win32" torchvision 0.22.1>=0.9.0 psutil 7.1.0 polars 1.34.0 ultralytics-thop 2.0.17>=2.0.0

Minimal Reproducible Example

-/-

Additional

No response

Are you willing to submit a PR?

[ ] Yes I'd like to help by submitting a PR!

Oct 10 '25 08:10 sfixl

👋 Hello @sfixl, thank you for reporting this and for your detailed notes and suggested fix 🙏 This is an automated response to help move things forward quickly—an Ultralytics engineer will also review and assist soon.

We recommend a visit to the Docs for helpful guidance, including many Python and CLI examples.

If this is a 🐛 Bug Report, please provide a minimum reproducible example (MRE) to help us debug it. Since you’re working with YOLOE visual prompts and masks, an ideal MRE would include:

Exact code to reproduce (including model weights path and how masks and cls are constructed)
A tiny sample image and corresponding mask(s) that trigger the error
The precise shapes and dtypes of inputs at each step (e.g., masks before/after pre_transform)
Your current ultralytics version and environment details (you shared these—thank you). Please also confirm whether the issue persists after upgrading to the latest release

If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.

Join the Ultralytics community where it suits you best. For real-time chat, head to Discord 🎧. Prefer in-depth discussions? Check out Discourse. Or dive into threads on our Subreddit to share knowledge with the community.

Upgrade

Upgrade to the latest ultralytics package including all requirements in a Python>=3.8 environment with PyTorch>=1.8 to verify your issue is not already resolved in the latest version:

pip install -U ultralytics

Environments

YOLO may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Notebooks with free GPU:
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Amazon Deep Learning AMI. See AWS Quickstart Guide
Docker Image. See Docker Quickstart Guide

Status

If this badge is green, all Ultralytics CI tests are currently passing. CI tests verify correct operation of all YOLO Modes and Tasks on macOS, Windows, and Ubuntu every 24 hours and on every commit.

Additional notes based on your report:

Thanks for pinpointing the line in yoloe/predict.py and for suggesting a potential change. If you’re willing, you can open a small PR referencing your finding and the exact shapes you observed to help us review faster.
Please also confirm whether this behavior reproduces on the latest main with a minimal input (single image + single mask), and share the output of yolo env along with the MRE.

We appreciate your help in improving Ultralytics 🙌

Oct 10 '25 08:10 UltralyticsAssistant

Can you open a PR for it?

Oct 10 '25 08:10 Y-T-G

@Y-T-G i fixed it in this fork: https://github.com/sfixl/ultralytics

I prefer to leave it to the ultralytics maintainers to fix the error the way they prefer it. I don't know the internals well enough, but i guess there are many possible fixes.

Oct 10 '25 08:10 sfixl

You can open the PR and the maintainers can edit it if needed

Oct 10 '25 08:10 Y-T-G

👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.

For additional resources and information, please see the links below:

Docs: https://docs.ultralytics.com
HUB: https://hub.ultralytics.com
Community: https://community.ultralytics.com

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLO 🚀 and Vision AI ⭐

Nov 10 '25 00:11 github-actions[bot]