YOLOE visual prompt with mask causes shape error
Search before asking
- [x] I have searched the Ultralytics YOLO issues and found no similar bug report.
Ultralytics YOLO Component
Predict
Bug
I am experimenting with YOLOE visual prompts and hit a problem with using masks as prompt input.
My code looks more or less like this:
model = YOLOE("yoloe-11s-seg.pt")
pred = YOLOEVPSegPredictor()
pred.setup_model(model.model)
prompts = dict(
masks=masks, # masks is a numpy float32 array of shape (1, H, W)
cls=cls # numpy array [1]
)
pred.set_prompts(prompts)
vpe = pred.get_vpe(img_path) # <-- here it fails
.../predict.py:73, in YOLOEVPDetectPredictor.pre_transform(self, im)
71 category = self.prompts["cls"]
72 if len(img) == 1:
...
4571 )
4572 if not torch.jit.is_scripting():
4573 if not all(_is_integer(x) for x in size):
ValueError: Input and output must have the same number of spatial dimensions, but got input with spatial dimensions of [640, 640, 1] and output size of (80, 80). Please provide input tensor in (N, C, d1, d2, ...,dK) format and output size in (o1, o2, ...,oK) format.
Suggested Fix
I traced it back to this line
https://github.com/ultralytics/ultralytics/blob/fa5d0bffbd63adfc302e9c118ffe5caa6447a044/ultralytics/models/yolo/yoloe/predict.py#L125
where the actual shape of the masks is (1,H,W,1) instead.
Looks like the BasePredictor.pre_transform (previous line) added a color dimension where it is not wanted.
replacing this line with
masks = np.squeeze(np.stack(resized_masks), axis=-1)
fixes the problem (for me)
Environment
Ultralytics 8.3.208 Python-3.10.9 torch-2.7.1+cpu CPU (11th Gen Intel Core i7-11700 @ 2.50GHz) Setup complete (16 CPUs, 63.7 GB RAM, 708.4/953.0 GB disk)
OS Windows-10-10.0.26100-SP0 Environment Windows Python 3.10.9 Install pip Path [../yoloe-tests/.venv/Lib/site-packages/ultralytics) RAM 63.73 GB Disk 708.4/953.0 GB CPU 11th Gen Intel Core i7-11700 @ 2.50GHz CPU count 16 GPU None GPU count None CUDA None
numpy 2.2.6>=1.23.0 matplotlib 3.10.7>=3.3.0 opencv-python 4.12.0.88>=4.6.0 pillow 11.3.0>=7.1.2 pyyaml 6.0.3>=5.3.1 requests 2.32.5>=2.23.0 scipy 1.15.3>=1.4.1 torch 2.7.1>=1.8.0 torch 2.7.1!=2.4.0,>=1.8.0; sys_platform == "win32" torchvision 0.22.1>=0.9.0 psutil 7.1.0 polars 1.34.0 ultralytics-thop 2.0.17>=2.0.0
Minimal Reproducible Example
-/-
Additional
No response
Are you willing to submit a PR?
- [ ] Yes I'd like to help by submitting a PR!
👋 Hello @sfixl, thank you for reporting this and for your detailed notes and suggested fix 🙏 This is an automated response to help move things forward quickly—an Ultralytics engineer will also review and assist soon.
We recommend a visit to the Docs for helpful guidance, including many Python and CLI examples.
If this is a 🐛 Bug Report, please provide a minimum reproducible example (MRE) to help us debug it. Since you’re working with YOLOE visual prompts and masks, an ideal MRE would include:
- Exact code to reproduce (including model weights path and how
masksandclsare constructed) - A tiny sample image and corresponding mask(s) that trigger the error
- The precise shapes and dtypes of inputs at each step (e.g., masks before/after
pre_transform) - Your current
ultralyticsversion and environment details (you shared these—thank you). Please also confirm whether the issue persists after upgrading to the latest release
If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.
Join the Ultralytics community where it suits you best. For real-time chat, head to Discord 🎧. Prefer in-depth discussions? Check out Discourse. Or dive into threads on our Subreddit to share knowledge with the community.
Upgrade
Upgrade to the latest ultralytics package including all requirements in a Python>=3.8 environment with PyTorch>=1.8 to verify your issue is not already resolved in the latest version:
pip install -U ultralytics
Environments
YOLO may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):
- Notebooks with free GPU:
- Google Cloud Deep Learning VM. See GCP Quickstart Guide
- Amazon Deep Learning AMI. See AWS Quickstart Guide
- Docker Image. See Docker Quickstart Guide
Status
If this badge is green, all Ultralytics CI tests are currently passing. CI tests verify correct operation of all YOLO Modes and Tasks on macOS, Windows, and Ubuntu every 24 hours and on every commit.
Additional notes based on your report:
- Thanks for pinpointing the line in
yoloe/predict.pyand for suggesting a potential change. If you’re willing, you can open a small PR referencing your finding and the exact shapes you observed to help us review faster. - Please also confirm whether this behavior reproduces on the latest
mainwith a minimal input (single image + single mask), and share the output ofyolo envalong with the MRE.
We appreciate your help in improving Ultralytics 🙌
Can you open a PR for it?
@Y-T-G i fixed it in this fork: https://github.com/sfixl/ultralytics
I prefer to leave it to the ultralytics maintainers to fix the error the way they prefer it. I don't know the internals well enough, but i guess there are many possible fixes.
You can open the PR and the maintainers can edit it if needed
👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.
For additional resources and information, please see the links below:
- Docs: https://docs.ultralytics.com
- HUB: https://hub.ultralytics.com
- Community: https://community.ultralytics.com
Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!
Thank you for your contributions to YOLO 🚀 and Vision AI ⭐