HUB Dataset shows "Invalid! Unable to process the dataset." error
Search before asking
- [x] I have searched the HUB issues and discussions and found no similar questions.
Question
Hi, I try to upload a dataset using the HUB Upload Dataset. The website shows the dataset has been successfully uploaded. However, it shows the "Invalid! Unable to process the dataset" error. Could you help me resolve this issue? My dataset has around 48 photos and around 20,000 labels. The data is for object detection.
Additional
👋 Hello @fluids, thank you for raising an issue about Ultralytics HUB 🚀! Please visit our HUB Docs to learn more:
- Quickstart. Start training and deploying YOLO models with HUB in seconds.
- Datasets: Preparing and Uploading. Learn how to prepare and upload your datasets to HUB in YOLO format.
- Projects: Creating and Managing. Group your models into projects for improved organization.
- Models: Training and Exporting. Train YOLOv5 and YOLOv8 models on your custom datasets and export them to various formats for deployment.
- Integrations. Explore different integration options for your trained models, such as TensorFlow, ONNX, OpenVINO, CoreML, and PaddlePaddle.
- Ultralytics HUB App. Learn about the Ultralytics App for iOS and Android, which allows you to run models directly on your mobile device.
- Inference API. Understand how to use the Inference API for running your trained models in the cloud to generate predictions.
It seems you're encountering an issue with dataset processing. If this is a 🐛 Bug Report, please provide the following to help us investigate further:
- A detailed description of the steps you followed to upload the dataset.
- A minimum reproducible example (MRE) if possible.
- Screenshots or logs that could help pinpoint the issue.
If you have any ❓ Questions, please share additional details about your dataset format, structure, and any preprocessing steps you applied.
An Ultralytics engineer will review and assist you as soon as possible. Thank you for your patience and for helping us improve Ultralytics HUB! 😊
An update for the question above: the issue seems caused by a label file which has around 3300 labels. After removing such image and label from the dataset, the above error disappears. However, it will be great if the the HUB (seems related to the "Loading" process) and YOLO models can deal with a single image with over 3300 labels, which is not rare in large aerial photos.
@fluids hi there! 👋 Thanks for sharing the details and troubleshooting the issue by identifying the problematic label file. Let's help you resolve this effectively:
-
Dataset Validation
- We strongly recommend validating your dataset locally before upload using our
check_dataset()function. This catches most common issues:from ultralytics.hub import check_dataset check_dataset("path/to/your_dataset.zip", task="detect")
- We strongly recommend validating your dataset locally before upload using our
-
Label File Requirements
- For object detection, each label file should follow the YOLO format:
class_id x_center y_center width heightwith normalized coordinates (0-1) - Ensure no empty label files exist (all images should have corresponding labels)
- Verify your label values are within valid ranges (e.g., coordinates ≤ 1.0)
- For object detection, each label file should follow the YOLO format:
-
Large-Scale Labels Handling While YOLO models can technically handle dense label scenarios, we recommend:
- Splitting ultra-dense aerial images into smaller tiles using tools like SAHI
- Ensuring balanced label distribution across images
- Using appropriate input resolutions (e.g., 1280px+ for aerial imagery)
-
Next Steps
- Try re-uploading after local validation
- Share your dataset structure details if the issue persists (we'd be happy to investigate further)
- Consider filing a GitHub issue with a minimal reproducible example if you believe this is a platform limitation
Thanks for your feedback about handling large label counts - we're always working to improve HUB's capabilities and will consider this use case in future updates! 🚀
For aerial-specific workflows, you might find our OBB documentation helpful for oriented object detection scenarios.
👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.
For additional resources and information, please see the links below:
- Docs: https://docs.ultralytics.com
- HUB: https://hub.ultralytics.com
- Community: https://community.ultralytics.com
Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!
Thank you for your contributions to YOLO 🚀 and Vision AI ⭐