yolov5
yolov5 copied to clipboard
No detections happening with custom dataset/model. Tried 3 Models. and a few different commits from master - am I missing a flag/argument?
Search before asking
- [X] I have searched the YOLOv5 issues and discussions and found no similar questions.
Question
Hey guys, I've been using Yolov5 for a quite some time, and never had an issue like this.
I was about 1000 commits behind master last week, so I decided to pull the latest master, found out about clearml and set that bad boy up. It's awesome!!
the issue arises with the latest code - None of my detections from a custom dataset are seen when processing with detect.py.
I've trained 3 different models to test with, and all three of them have the same issues, where not a single detection/bounding box appears on the video I'm processing.
This same dataset was working fine with the previous versions of Yolov5.
Model configurations w/ custom dataset:
- yolov5n.yaml trained on ClearML remote agent (RTX 3050)
- yolov5s.yaml trained on ClearML remote agent (RTX 3050)
- yolov5s.yaml trained locally (RTX 3080)
Most recent command used to train to quickly see if it was a problem with ClearML models (Usually train 2000 epochs):
python3 train.py --batch-size 40 --epochs 50 --data clearml://{ID} --cfg yolov5s.yaml --weights '' --name official_small-3080 --cache ram
The ClearML models are trained identically aside from the batch size due to the card on my agent being an RTX 3050
Command used to detect with. **
python detect.py --weights ~/official_small_3080.pt --conf 0.4 --img-size 640 --source ~/Single_Cam_Video_1.mp4 --view-img --~/data.yaml
I've tried setting my --conf down to 0.1 and not even that allows detections to run.
Is there a flag in the latest code that I'm missing that prevents the detections from happening? I noticed the vid_stride argument which speeds up or slows the video, (also, if anyone can answer how to use this correctly I'd be very appreciative) so it's possible I missed a new addition or change that is preventing my detections.
If you need any more information please let me know, I'd be happy to provide what I can!
Additional
No response
👋 Hello @ChaseDDevelopment, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.
If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.
If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.
For business inquiries or professional support requests please visit https://ultralytics.com or email [email protected].
Requirements
Python>=3.7.0 with all requirements.txt installed including PyTorch>=1.7. To get started:
git clone https://github.com/ultralytics/yolov5 # clone
cd yolov5
pip install -r requirements.txt # install
Environments
YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):
- Google Colab and Kaggle notebooks with free GPU:
- Google Cloud Deep Learning VM. See GCP Quickstart Guide
- Amazon Deep Learning AMI. See AWS Quickstart Guide
- Docker Image. See Docker Quickstart Guide
Status
If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), validation (val.py), inference (detect.py) and export (export.py) on macOS, Windows, and Ubuntu every 24 hours and on every commit.
Just trained a model using my "local" dataset rather than the clearml uploaded dataset and it seems to be running fine now. Does anyone know what could be causing this issue? The local dataset's .yaml file doesn't use the test key ( exported from roboflow - though they may add that now, the dataset is older) and it detects fine with the commands written above. but when i use clearml-data sync to upload my dataset that dataset doesnt work at all even with 0.01 conf threshold. I tried uploading the dataset to clearml just now without the test key and it won't train from the dataset, only when it has that key, but then the detection is horrible.
Does anyone have an idea of what could be causing this?
@ChaseDDevelopment --vide-stride 1 (default) uses every video frame, 2 uses every other frame etc.
There are no new settings needed to train or predict, but the ClearML integration is rather new so it may have an issue. Could you verify the bug is reproducible with a common model (i.e. yolov5s.pt) and common dataset (i.e. COCO128.yaml or VOC.yaml?) and provide exact steps to help us reproduce on our end?
@thepycoder could you take a look at this please?
EDIT: BTW test key should be optional and is not used by default during training even if present. The only way to to use your test set is to run python val.py --task test, default is --task val
Yes I'll work on that right away. An interesting thing I just noticed, is that my "local" dataset, has 2598 images and 2598 labels in the train folder. and when a model is trained with that, it works nicely. (This is downloaded directly from roboflow)
The one that gets downloaded from my clearML instance has 4843 images, and 2245 labels. So there is definitely something fishy going on with how it uploads or downloads it, or what ClearML does with the data.
I am going to try their CLI command clearml-data upload rather than sync, and see if it still produces the same results on my dataset. I'll also try the coco128.yaml to see if its reproducable with the Yolov5 common stuff.
@glenn-jocher Okay so,
With my custom dataset:
When using the clearml-data sync --project YOLOv5 --name {NAME} --folder . command, it provided the results above/below:
4843 images, and 2245 labels in the train folder. - ClearML 2598 images and 2598 labels in the train folder - Direct Roboflow download
The dataset shown in ClearML says it has ~8K Files
when using the following command to get my dataset into ClearML the following results occur:
clearml-data create --name {NAME} --project YOLOv5
clearml-data add --files .
clearml-data close
2598 images and 2598 labels in the train folder -- ClearML 2598 images and 2598 labels in the train folder -- Direct Roboflow download.
The dataset shown in ClearML says it has ~6K Files - This matches my local dataset.
For both of these commands run, my directory structure looked like the following:
├── datasets
│ └── {NAME}
│ ├── data.yaml
│ ├── test
│ │ ├── images
│ │ └── labels
│ ├── train
│ │ ├── images
│ │ └── labels
│ └── valid
│ ├── images
│ └── labels
And the data.yaml looked like the following:
train: ./train/images
val: ./valid/images
test: ./train/images
nc: 4
names: ['{name}', '{name}', '{name}', '{name}']
With the Coco128.yaml dataset:
When using the clearml-data sync --project YOLOv5 --name coco --folder . command, it provided the results above/below:
128 images in folder, 128 labels in folder
when using the following command to get the COCO128 dataset into ClearML the following results occur:
clearml-data create --name coco--project YOLOv5
clearml-data add --files .
clearml-data close
128 images in folder, 128 labels in folder
For both these commands the directory structure was as follows:
├── coco
│ ├── coco128.yaml
│ ├── images
│ ├── labels
│ ├── LICENSE
│ └── README.txt
And the data.yaml looked like the following:
# YOLOv5 🚀 by Ultralytics, GPL-3.0 license
# COCO128 dataset https://www.kaggle.com/ultralytics/coco128 (first 128 images from COCO train2017) by Ultralytics
# Example usage: python train.py --data coco128.yaml
# parent
# ├── yolov5
# └── datasets
# └── coco128 ← downloads here (7 MB)
# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]
path: ../datasets/coco128 # dataset root dir
train: images/train2017 # train images (relative to 'path') 128 images
val: images/train2017 # val images (relative to 'path') 128 images
test: # test images (optional)
# Classes
names:
0: person
1: bicycle
2: car
3: motorcycle
4: airplane
5: bus
6: train
7: truck
8: boat
9: traffic light
10: fire hydrant
11: stop sign
12: parking meter
13: bench
14: bird
15: cat
16: dog
17: horse
18: sheep
19: cow
20: elephant
21: bear
22: zebra
23: giraffe
24: backpack
25: umbrella
26: handbag
27: tie
28: suitcase
29: frisbee
30: skis
31: snowboard
32: sports ball
33: kite
34: baseball bat
35: baseball glove
36: skateboard
37: surfboard
38: tennis racket
39: bottle
40: wine glass
41: cup
42: fork
43: knife
44: spoon
45: bowl
46: banana
47: apple
48: sandwich
49: orange
50: broccoli
51: carrot
52: hot dog
53: pizza
54: donut
55: cake
56: chair
57: couch
58: potted plant
59: bed
60: dining table
61: toilet
62: tv
63: laptop
64: mouse
65: remote
66: keyboard
67: cell phone
68: microwave
69: oven
70: toaster
71: sink
72: refrigerator
73: book
74: clock
75: vase
76: scissors
77: teddy bear
78: hair drier
79: toothbrush
# Download script/URL (optional)
download: https://ultralytics.com/assets/coco128.zip
So it seems like it was an error with how ClearML "Sync'd" my custom dataset to clearml, or how my dataset was structured from Roboflow
The coco dataset seems to work as expected with both sets of commands.
I was able to train a model from ClearML with my custom dataset when not using sync and it works as expected now.
Hi @ChaseDDevelopment, thanks a ton for the detailed write-up and follow-up work!
Glad to hear you got to move forward using the normal dataset create commands, but I'd still want to get to the bottom of this, because it does not sound like normal behaviour :)
Sync will create a datasets that's based on the previous one with the same name (the idea being you have a single source of truth: the local folder). Could it be that a previous version, which might have been incomplete, was the issue here? We should be able to check by syncing using a new, non existing, project name and checking if the result is still the same.
In the meantime, I'm making a testcase for this to see if I can reproduce your issue. I'll update here if that's ok for you.
Thanks again!
Update: I can't reproduce the issue, using this code:
import os
import uuid
def generate_dataset():
os.makedirs('datasets', exist_ok=True)
os.makedirs('datasets/ChaseDDevelopment', exist_ok=True)
for subset in ['test', 'train', 'valid']:
for subfolder in ['images', 'labels']:
folder = f'datasets/ChaseDDevelopment/{subset}/{subfolder}'
os.makedirs(folder, exist_ok=True)
for i in range(10):
with open(f'{folder}/{i}-{uuid.uuid4()}.txt', 'w') as f:
f.writelines(['Hello World!', str(uuid.uuid4())])
if __name__ == '__main__':
generate_dataset()
I can generate your dataset structure, immediately calling clearml-data sync --project ChaseDDevelopment1 --name ChaseDDevelopment --folder . worked like I would expect. Can you maybe share a screenshot of your dataset history (if you're allowed to share it of course!)
I can generate your dataset structure, immediately calling
clearml-data sync --project ChaseDDevelopment1 --name ChaseDDevelopment --folder .worked like I would expect. Can you maybe share a screenshot of your dataset history (if you're allowed to share it of course!)
Unfortunately I can't share my dataset history.
It was a fresh curl of the dataset from Yolov5, that was trained with 2 weeks ago on the version of Yolo I had that was 1000 commits out of date. I had deleted it to re-download it and pull it with the curl command. If you can't reproduce it, It's not a major issue then, and was probably an issue somewhere on my end.
I do appreciate the time spent on this though, so thank you.
I think this can probably be closed unless there is anything else I could provide other than my dataset history?
@ChaseDDevelopment
No worries! If you don't have the datasets in ClearML anymore, there doesn't seem to be much more we can do for now. Thanks for putting in the time and effort though, we'll keep our eyes open for weird sync behaviour.
Feel free to ping me directly in an issue or so, if you find something else not working or behaving strangely with ClearML in the future, we're always looking to make it better for everyone :D
@glenn-jocher I think this issue can indeed be closed.
👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.
Access additional YOLOv5 🚀 resources:
- Wiki – https://github.com/ultralytics/yolov5/wiki
- Tutorials – https://docs.ultralytics.com/yolov5
- Docs – https://docs.ultralytics.com
Access additional Ultralytics ⚡ resources:
- Ultralytics HUB – https://ultralytics.com/hub
- Vision API – https://ultralytics.com/yolov5
- About Us – https://ultralytics.com/about
- Join Our Team – https://ultralytics.com/work
- Contact Us – https://ultralytics.com/contact
Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!
Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐!
Thanks, @thepycoder! Your feedback and collaboration are what help drive our community and projects forward. We'll continue to keep an eye on this and appreciate you bringing it to our attention. Feel free to reach out if you encounter anything else in the future. 👍
Closing the issue now.