`ValueError` when training `yoloe-11n-seg` from scratch
Search before asking
- [x] I have searched the Ultralytics YOLO issues and found no similar bug report.
Ultralytics YOLO Component
Train
Bug
I'm trying to train the yoloe-11n-seg model from scratch. Following the instructions in the YOLOE page, I downloaded all datasets and created the script below, with only minor changes to account for limited GPU resources (see the MRP below). However, when I run the script I run into the following error:
Traceback (most recent call last):
File "train_yoloe.py", line 22, in <module>
model.train(
File "/home/user/airchair/lib/python3.8/site-packages/ultralytics/engine/model.py", line 791, in train
self.trainer.train()
File "/home/user/airchair/lib/python3.8/site-packages/ultralytics/engine/trainer.py", line 211, in train
self._do_train(world_size)
File "/home/user/airchair/lib/python3.8/site-packages/ultralytics/engine/trainer.py", line 385, in _do_train
loss, self.loss_items = self.model(batch)
File "/home/user/airchair/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/user/airchair/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/home/user/airchair/lib/python3.8/site-packages/ultralytics/nn/tasks.py", line 119, in forward
return self.loss(x, *args, **kwargs)
File "/home/user/airchair/lib/python3.8/site-packages/ultralytics/nn/tasks.py", line 1043, in loss
return self.criterion(preds, batch)
File "/home/user/airchair/lib/python3.8/site-packages/ultralytics/utils/loss.py", line 332, in __call__
masks = F.interpolate(masks[None], (mask_h, mask_w), mode="nearest")[0]
File "/home/user/airchair/lib/python3.8/site-packages/torch/nn/functional.py", line 3983, in interpolate
raise ValueError(
ValueError: Input and output must have the same number of spatial dimensions, but got input with spatial dimensions of [] and output size of (160, 160). Please provide input tensor in (N, C, d1, d2, ...,dK) format and output size in (o1, o2, ...,oK) format.
Any guesses on what's going wrong?
Environment
Ultralytics 8.3.107 π Python-3.8.10 torch-2.4.1+cu121 CUDA:0 (NVIDIA GeForce RTX 3090, 24135MiB)
Setup complete β
(20 CPUs, 94.0 GB RAM, 1645.0/1832.7 GB disk)
OS Linux-6.8.0-52-generic-x86_64-with-glibc2.29
Environment Linux
Python 3.8.10
Install pip
Path /home/user/airchair/lib/python3.8/site-packages/ultralytics
RAM 93.98 GB
Disk 1645.0/1832.7 GB
CPU Intel Core(TM) i9-10900X 3.70GHz
CPU count 20
GPU NVIDIA GeForce RTX 3090, 24135MiB
GPU count 2
CUDA 12.1
numpy β
1.24.4<=2.1.1,>=1.23.0
matplotlib β
3.7.5>=3.3.0
opencv-python β
4.11.0.86>=4.6.0
pillow β
8.4.0>=7.1.2
pyyaml β
5.3.1>=5.3.1
requests β
2.32.3>=2.23.0
scipy β
1.10.1>=1.4.1
torch β
2.4.1>=1.8.0
torch β
2.4.1!=2.4.0,>=1.8.0; sys_platform == "win32"
torchvision β
0.19.1>=0.9.0
tqdm β
4.67.1>=4.64.0
psutil β
5.9.8
py-cpuinfo β
9.0.0
pandas β
2.0.3>=1.1.4
seaborn β
0.13.2>=0.11.0
ultralytics-thop β
2.0.14>=2.0.0
Minimal Reproducible Example
from ultralytics import YOLOE
from ultralytics.models.yolo.yoloe import YOLOESegTrainerFromScratch
data = dict(
train=dict(
yolo_data=["Objects365.yaml"],
grounding_data=[
dict(
img_path="../datasets/flickr/full_images/",
json_file="../datasets/flickr/annotations/final_flickr_separateGT_train_segm.json",
),
dict(
img_path="../datasets/mixed_grounding/gqa/images",
json_file="../datasets/mixed_grounding/annotations/final_mixed_train_no_coco_segm.json",
),
],
),
val=dict(yolo_data=["lvis.yaml"]),
)
model = YOLOE("yoloe-11n-seg.yaml")
model.train(
data=data,
batch=4, # Reduce batch size to account for smaller GPU memory
epochs=960, # Increase epochs to compensate
close_mosaic=2,
optimizer="AdamW",
lr0=2e-3,
warmup_bias_lr=0.0,
weight_decay=0.025,
momentum=0.9,
workers=4,
trainer=YOLOESegTrainerFromScratch,
device="0" # Use a single GPU instead of 8
)
Additional
No response
Are you willing to submit a PR?
- [ ] Yes I'd like to help by submitting a PR!
π Hello @xperroni, thank you for your detailed bug report and for providing a thorough minimal reproducible example (MRE) π! This is an automated response to help get you startedβan Ultralytics engineer will also review and assist you soon.
For new users, we recommend checking the Docs where you'll find helpful Python and CLI usage examples. Many common questions are answered there, which might help you troubleshoot while you wait.
If you have further details or updates to your MRE, please share them to help our team reproduce and resolve the issue efficiently π οΈ.
Join the Ultralytics community for discussion and support:
Upgrade
Please ensure you are using the latest ultralytics package and all required dependencies in a Python>=3.8 environment with PyTorch>=1.8. This helps verify the issue isn't resolved in a newer release:
pip install -U ultralytics
See all requirements in pyproject.toml.
Environments
YOLO models can be run in any of these up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python, and PyTorch preinstalled):
- Notebooks with free GPU:
- Google Cloud Deep Learning VM. See GCP Quickstart Guide
- Amazon Deep Learning AMI. See AWS Quickstart Guide
- Docker Image. See Docker Quickstart Guide
Status
If this badge is green, all Ultralytics CI tests are passing. CI tests verify correct operation of all YOLO Modes and Tasks on macOS, Windows, and Ubuntu every 24 hours and on every commit.
Thanks again for your report! An Ultralytics engineer will follow up soon.
Can you try latest version?
Upgraded to version 8.3.130, as shown in the environment report below. Still getting same error.
Ultralytics 8.3.130 π Python-3.8.10 torch-2.4.1+cu121 CUDA:0 (NVIDIA GeForce RTX 3090, 24135MiB)
Setup complete β
(20 CPUs, 94.0 GB RAM, 1645.0/1832.7 GB disk)
OS Linux-6.8.0-52-generic-x86_64-with-glibc2.29
Environment Linux
Python 3.8.10
Install pip
Path /home/user/airchair/lib/python3.8/site-packages/ultralytics
RAM 93.98 GB
Disk 1645.0/1832.7 GB
CPU Intel Core(TM) i9-10900X 3.70GHz
CPU count 20
GPU NVIDIA GeForce RTX 3090, 24135MiB
GPU count 2
CUDA 12.1
numpy β
1.24.4>=1.23.0
matplotlib β
3.7.5>=3.3.0
opencv-python β
4.11.0.86>=4.6.0
pillow β
8.4.0>=7.1.2
pyyaml β
5.3.1>=5.3.1
requests β
2.32.3>=2.23.0
scipy β
1.10.1>=1.4.1
torch β
2.4.1>=1.8.0
torch β
2.4.1!=2.4.0,>=1.8.0; sys_platform == "win32"
torchvision β
0.19.1>=0.9.0
tqdm β
4.67.1>=4.64.0
psutil β
5.9.8
py-cpuinfo β
9.0.0
pandas β
2.0.3>=1.1.4
seaborn β
0.13.2>=0.11.0
ultralytics-thop β
2.0.14>=2.0.0
@Laughing-q Can you check?
I did notice however that I get different errors depending on the size of the batch argument. For example, if I put batch back to 128 this is what I get:
Traceback (most recent call last):
File "train_yoloe.py", line 22, in <module>
model.train(
File "/home/user/airchair/lib/python3.8/site-packages/ultralytics/engine/model.py", line 793, in train
self.trainer.train()
File "/home/user/airchair/lib/python3.8/site-packages/ultralytics/engine/trainer.py", line 212, in train
self._do_train(world_size)
File "/home/user/airchair/lib/python3.8/site-packages/ultralytics/engine/trainer.py", line 385, in _do_train
batch = self.preprocess_batch(batch)
File "/home/user/airchair/lib/python3.8/site-packages/ultralytics/models/yolo/yoloe/train.py", line 200, in preprocess_batch
txt_feats = torch.stack([self.text_embeddings[text] for text in texts]).to(self.device)
File "/home/user/airchair/lib/python3.8/site-packages/ultralytics/models/yolo/yoloe/train.py", line 200, in <listcomp>
txt_feats = torch.stack([self.text_embeddings[text] for text in texts]).to(self.device)
KeyError: 'what'
Thanks for providing the detailed error information and testing with different batch sizes. This appears to be an issue with the segmentation mask processing during training.
The first error indicates a problem with mask dimensions during interpolation - the input mask has no spatial dimensions but the function is trying to resize it to 160x160. This typically happens when there's a mismatch between the expected segmentation mask format and what's being provided in the dataset.
The second error with larger batch sizes (KeyError: 'what') suggests an issue with text embeddings - the model is trying to load an embedding for the word "what" which isn't in the embedding dictionary.
This could be related to dataset preparation or compatibility issues with the current implementation. Since you're training from scratch, I'd suggest:
- Check that your datasets (Objects365, Flickr, and mixed_grounding) are properly formatted for segmentation tasks
- Try with a pre-built model first (using
yoloe-11n-seg.ptinstead ofyoloe-11n-seg.yaml) - Consider reducing the complexity by starting with just one dataset to isolate the issue
These issues often relate to dataset preprocessing specifics that are expected by the YOLOE implementation. I'll ask our development team to look more closely at the YOLOE segmentation training code path.
@xperroni Hey the error seems to me that you were train a segmentation model with detection labels. Did you use the generated mask annotations for Object365 dataset?
https://docs.ultralytics.com/models/yoloe/#train-official-models
and use this script from official team to convert the segment json labels to yolo format.
Alternatively I wonder probably you could directly use the objects365_train_segm.json as grounding_data by doing:
data = dict(
train=dict(
grounding_data=[
dict(
img_path="../datasets/flickr/full_images/",
json_file="../datasets/flickr/annotations/final_flickr_separateGT_train_segm.json",
),
dict(
img_path="../datasets/mixed_grounding/gqa/images",
json_file="../datasets/mixed_grounding/annotations/final_mixed_train_no_coco_segm.json",
),
dict(
img_path="../datasets/object365/images",
json_file="../datasets/object365/annotations/objects365_train_segm.json",
),
],
),
val=dict(yolo_data=["lvis.yaml"]),
)
I haven't tried Object365 this way as I don't have the dataset locally for now. Please feel free to try it and let me know if there's any issues.
Alternatively I wonder probably you could directly use the objects365_train_segm.json as
grounding_data
Yes, that did the trick. I had previously run the training script without manually downloading Objects365 first, so it was auto-downloaded, but obviously with the annotations for detection instead of segmentation.
I suppose a documentation update is in order...?
I'm glad to hear the solution worked, @xperroni! The issue was indeed related to annotation formats - the auto-downloaded Objects365 dataset uses detection annotations, while YOLOE segmentation models require proper segmentation mask annotations.
You make a valid point about documentation. This distinction between detection and segmentation annotations should be clearer in our docs, especially since the auto-downloaded version doesn't contain the required segmentation masks. I'll create an internal note to improve this documentation, making it more explicit that segmentation training requires different annotation formats than detection.
Thanks for reporting this issue and verifying the solution. This feedback helps us improve both the code and documentation for all users.
π Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.
For additional resources and information, please see the links below:
- Docs: https://docs.ultralytics.com
- HUB: https://hub.ultralytics.com
- Community: https://community.ultralytics.com
Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!
Thank you for your contributions to YOLO π and Vision AI β