hub icon indicating copy to clipboard operation
hub copied to clipboard

I am a Hub Pro User, Trying to Upload Data For Pose detection, after upload its showing Unable to process the dataset error.

Open kumarneeraj2005 opened this issue 9 months ago • 20 comments

Search before asking

  • [X] I have searched the HUB issues and found no similar bug report.

HUB Component

Datasets

Bug

Hello, Ultralytics Hub Pro Team. While uploading data for pose detection, I receive an error stating "Unable to process the dataset". Could you please look at this issue? Dataset size - 10.9 GB. I manually checked that all of the label formats and folder structures are correct. Screenshot 2024-04-28 at 7 56 28 AM Screenshot 2024-04-28 at 7 56 28 AM

Environment

No response

Minimal Reproducible Example

No response

Additional

No response

kumarneeraj2005 avatar Apr 28 '24 02:04 kumarneeraj2005

👋 Hello @kumarneeraj2005, thank you for raising an issue about Ultralytics HUB 🚀! Please visit our HUB Docs to learn more:

  • Quickstart. Start training and deploying YOLO models with HUB in seconds.
  • Datasets: Preparing and Uploading. Learn how to prepare and upload your datasets to HUB in YOLO format.
  • Projects: Creating and Managing. Group your models into projects for improved organization.
  • Models: Training and Exporting. Train YOLOv5 and YOLOv8 models on your custom datasets and export them to various formats for deployment.
  • Integrations. Explore different integration options for your trained models, such as TensorFlow, ONNX, OpenVINO, CoreML, and PaddlePaddle.
  • Ultralytics HUB App. Learn about the Ultralytics App for iOS and Android, which allows you to run models directly on your mobile device.
    • iOS. Learn about YOLO CoreML models accelerated on Apple's Neural Engine on iPhones and iPads.
    • Android. Explore TFLite acceleration on mobile devices.
  • Inference API. Understand how to use the Inference API for running your trained models in the cloud to generate predictions.

If this is a 🐛 Bug Report, please provide screenshots and steps to reproduce your problem to help us get started working on a fix.

If this is a ❓ Question, please provide as much information as possible, including dataset, model, environment details etc. so that we might provide the most helpful response.

We try to respond to all issues as promptly as possible. Thank you for your patience!

github-actions[bot] avatar Apr 28 '24 02:04 github-actions[bot]

@kumarneeraj2005 hello! 😊

Thanks for reaching out and for being a part of our Hub Pro community. It seems like you've done a preliminary check on the dataset and everything looks in place, which is great! The "Unable to process the dataset" error can sometimes be caused by transient issues with our servers or by specific peculiarities within the dataset that aren't immediately obvious.

Here's what we recommend:

  1. Retry the upload - Sometimes, giving it another go can solve the issue if it was related to a temporary server hiccup.
  2. Check for hidden files or corrupt images - Occasionally, hidden files or corrupt images in the dataset can cause processing to fail. Make sure all files are in the expected format and can be opened without issues.
  3. Validate dataset size and format - Ensure your dataset size is within the limits for Pro users and that all files adhere to the expected formats for pose detection datasets.

If after trying these steps you're still facing the issue, could you provide us with some more details? Specifically:

  • Are there any specific file types in your dataset?
  • Have you been able to successfully upload other datasets for pose detection or is this the first one?

We're here to help you get this resolved! Thanks for your patience and for being a valued member of the Ultralytics community. For further assistance, please refer to our docs at https://docs.ultralytics.com/hub which might give you more detailed guidance on dataset requirements and troubleshooting steps.

pderrenger avatar Apr 28 '24 05:04 pderrenger

we did all these 3 steps and everything is fine but still getting same error. Could you please check from backend ?? if needed i will share details, please provide your support email ID. We are struggling from yesterday. We tried more than 10 times.

kumarneeraj2005 avatar Apr 28 '24 06:04 kumarneeraj2005

@kumarneeraj2005 You can validate your dataset like this (before uploading it to Ultralytics HUB):

from ultralytics.hub import check_dataset
check_dataset('path/to/coco8.zip')

Please let us know if the local validation is successful. If it isn't, the check_dataset function should give you insights to help you figure out what is wrong with your dataset.

sergiuwaxmann avatar Apr 28 '24 06:04 sergiuwaxmann

@sergiuwaxmann local validation is successful.. Please help us..How to resolve above issue..

Starting HUB dataset checks for /Users/ashishjha/Downloads/data/pose_data.zip.... Scanning /Users/ashishjha/Downloads/data/pose_data/labels/train... 61156 images, 3 backgrounds, 0 corrupt: 100%|██████████| 61156/61156 [00:14<00:00, 4157.47it/s] New cache created: /Users/ashishjha/Downloads/data/pose_data/labels/train.cache albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8)) Statistics: 100%|██████████| 61156/61156 [00:00<00:00, 710238.39it/s] Scanning /Users/ashishjha/Downloads/data/pose_data/labels/val... 3219 images, 0 backgrounds, 0 corrupt: 100%|██████████| 3219/3219 [00:00<00:00, 4241.20it/s] New cache created: /Users/ashishjha/Downloads/data/pose_data/labels/val.cache albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8)) Statistics: 100%|██████████| 3219/3219 [00:00<00:00, 809003.81it/s]

kumarneeraj2005 avatar Apr 28 '24 15:04 kumarneeraj2005

Hello, Ultralytics team, could you please let me know, if you have any support option for Pro subscription ? or its simply Pro? very frustrated with your solution.

kumarneeraj2005 avatar Apr 29 '24 10:04 kumarneeraj2005

Hello @kumarneeraj2005!

We just checked and everything seems to be working correctly on our end. For example, we can successfully upload our example pose dataset to Ultralytics HUB.

Of course, the most important aspect is having a valid dataset format. The check_dataset function should tell you if you can upload the dataset to Ultralytics HUB or not (if your dataset is formatted correctly, you should see Checks completed correctly ✅. Upload this dataset to https://hub.ultralytics.com/datasets/. in your console). Please see below an example of a Pose dataset formatted correctly. dataset_format_pose

Here are some things that you can check in Ultralytics HUB:

  1. Make sure you select "Pose" when you upload the dataset to Ultralytics HUB. dataset_upload_pose
  2. Make sure you have enough space on your Ultralytics HUB account. storage

If the points above do not help you, please share your account's email or a dataset/project/model ID with us so we can investigate your account.

sergiuwaxmann avatar Apr 29 '24 11:04 sergiuwaxmann

@yogendrasinghx @sergiuwaxmann

check_size check data_upload

As you suggested above, everything seems to be fine from our end. Dataset is totally valid as per your platform requirement.

For your reference i am sharing my userID and Project id :

UserID : [email protected]

Please do let me know if anything else is needed.

kumarneeraj2005 avatar Apr 29 '24 12:04 kumarneeraj2005

@kumarneeraj2005

@Laughing-q discovered the issue occurs when there is an empty array of keypoints in the dataset and created a PR to fix this. Thank you for bringing this to our attention. I will update you as soon as we merge and deploy the fix. Please accept our apologies for the inconvenience caused.

Alternatively, you can remove the 3 background images from your dataset if you do not want to wait for the fix to be merged and deployed.

sergiuwaxmann avatar Apr 29 '24 13:04 sergiuwaxmann

@sergiuwaxmann could you please let me know which 3 background images you are talking about, if possible give me images detail...

kumarneeraj2005 avatar Apr 29 '24 15:04 kumarneeraj2005

@kumarneeraj2005 Unfortunately, I do not have access to your dataset but I can see in the screenshot you shared that you have 3 backgrounds.

sergiuwaxmann avatar Apr 29 '24 15:04 sergiuwaxmann

@sergiuwaxmann we will wait for your fix

kumarneeraj2005 avatar Apr 30 '24 06:04 kumarneeraj2005

@sergiuwaxmann @yogendrasinghx WhatsApp Image 2024-05-01 at 19 26 47 WhatsApp Image 2024-05-01 at 19 27 09

Guys - seems your platform is not ready for Production, its really serious issue. check attached images. When dataset is small its accepting with so called background images and uploading to your platform, but when its large dataset your platform is not able to handle. I request you to please look in this issue on priority basis. O/w cancel my Pro-membership, and i am really serious.

kumarneeraj2005 avatar May 01 '24 14:05 kumarneeraj2005

@sergiuwaxmann @kumarneeraj2005 merged fix PR https://github.com/ultralytics/ultralytics/pull/10415 by @Laughing-q earlier today and is now published in ultralytics 8.2.6. I'll sync up with you to redeploy HUB with these fixes.

@kumarneeraj2005 we should have this fixed soon, thank you for your patience here and helping us diagnose the problem!

glenn-jocher avatar May 01 '24 15:05 glenn-jocher

Hey @kumarneeraj2005, great news! 🎉 The HUB has been fully updated with all the latest fixes from https://github.com/ultralytics/ultralytics/pull/10415, thanks to the examples you provided for debugging. 🛠️

Please give your dataset upload and training another go, and don't hesitate to reach out if you encounter any more issues or have suggestions for improvement. Your input is incredibly valuable in enhancing our product. Looking forward to hearing from you! 😊

glenn-jocher avatar May 02 '24 20:05 glenn-jocher

@glenn-jocher Thanks, Able to upload dataset. While training on your cloud it showing 116 hours for 64K images, could you please tell me is it normal behaviour ?? Is any way to reduce the training time ? and can we have an option to select better GPU except your T4. image

kumarneeraj2005 avatar May 03 '24 08:05 kumarneeraj2005

@kumarneeraj2005 I am glad the upload works. Once again, thank you for bringing this to our attention! The estimated remaining training time is adjusted during training (it takes us a few epochs to calculate remaining time more accurately).

sergiuwaxmann avatar May 03 '24 10:05 sergiuwaxmann

@sergiuwaxmann @glenn-jocher could you please explain why your system is deducting 2 bills for single model training ?? image

kumarneeraj2005 avatar May 05 '24 09:05 kumarneeraj2005

@kumarneeraj2005 It looks like you resumed training. This might be a UI issue (not showing the first training session as completed). I will investigate our system and refund the extra charges if the balance was subtracted two times.

sergiuwaxmann avatar May 05 '24 12:05 sergiuwaxmann

@glenn-jocher Thanks, Able to upload dataset. While training on your cloud it showing 116 hours for 64K images, could you please tell me is it normal behaviour ?? Is any way to reduce the training time ? and can we have an option to select better GPU except your T4. image

@kumarneeraj2005 hi there! About your training time question, yes this is a long time! We are working on supporting newer GPUs soon like NVIDIA L4 GPUs and NVIDIA L40S GPUs from the Ada-Lovelace generation which should be able to complete your training much faster. Hopefully we will have these updates in place over the next few months as we continue to update HUB with the best features and fixes.

glenn-jocher avatar May 05 '24 15:05 glenn-jocher

@kumarneeraj2005 It looks like you resumed training. This might be a UI issue (not showing the first training session as completed). I will investigate our system and refund the extra charges if the balance was subtracted two times.

@sergiuwaxmann any update on this ?

kumarneeraj2005 avatar May 07 '24 07:05 kumarneeraj2005

@kumarneeraj2005 I replied on the other issue you opened.

sergiuwaxmann avatar May 07 '24 07:05 sergiuwaxmann