yolov5 icon indicating copy to clipboard operation
yolov5 copied to clipboard

No detections happening with custom dataset/model. Tried 3 Models. and a few different commits from master - am I missing a flag/argument?

Open ChaseDDevelopment opened this issue 3 years ago • 9 comments
trafficstars

Search before asking

  • [X] I have searched the YOLOv5 issues and discussions and found no similar questions.

Question

Hey guys, I've been using Yolov5 for a quite some time, and never had an issue like this.

I was about 1000 commits behind master last week, so I decided to pull the latest master, found out about clearml and set that bad boy up. It's awesome!!

the issue arises with the latest code - None of my detections from a custom dataset are seen when processing with detect.py.

I've trained 3 different models to test with, and all three of them have the same issues, where not a single detection/bounding box appears on the video I'm processing.

This same dataset was working fine with the previous versions of Yolov5.

Model configurations w/ custom dataset:

  • yolov5n.yaml trained on ClearML remote agent (RTX 3050)
  • yolov5s.yaml trained on ClearML remote agent (RTX 3050)
  • yolov5s.yaml trained locally (RTX 3080)

Most recent command used to train to quickly see if it was a problem with ClearML models (Usually train 2000 epochs):

python3 train.py --batch-size 40 --epochs 50 --data clearml://{ID} --cfg yolov5s.yaml --weights '' --name official_small-3080 --cache ram

The ClearML models are trained identically aside from the batch size due to the card on my agent being an RTX 3050

Command used to detect with. **

python detect.py --weights ~/official_small_3080.pt --conf 0.4 --img-size 640 --source ~/Single_Cam_Video_1.mp4 --view-img --~/data.yaml

I've tried setting my --conf down to 0.1 and not even that allows detections to run.

Is there a flag in the latest code that I'm missing that prevents the detections from happening? I noticed the vid_stride argument which speeds up or slows the video, (also, if anyone can answer how to use this correctly I'd be very appreciative) so it's possible I missed a new addition or change that is preventing my detections.

If you need any more information please let me know, I'd be happy to provide what I can!

Additional

No response

ChaseDDevelopment avatar Sep 18 '22 17:09 ChaseDDevelopment

👋 Hello @ChaseDDevelopment, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://ultralytics.com or email [email protected].

Requirements

Python>=3.7.0 with all requirements.txt installed including PyTorch>=1.7. To get started:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

CI CPU testing

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), validation (val.py), inference (detect.py) and export (export.py) on macOS, Windows, and Ubuntu every 24 hours and on every commit.

github-actions[bot] avatar Sep 18 '22 17:09 github-actions[bot]

Just trained a model using my "local" dataset rather than the clearml uploaded dataset and it seems to be running fine now. Does anyone know what could be causing this issue? The local dataset's .yaml file doesn't use the test key ( exported from roboflow - though they may add that now, the dataset is older) and it detects fine with the commands written above. but when i use clearml-data sync to upload my dataset that dataset doesnt work at all even with 0.01 conf threshold. I tried uploading the dataset to clearml just now without the test key and it won't train from the dataset, only when it has that key, but then the detection is horrible.

Does anyone have an idea of what could be causing this?

ChaseDDevelopment avatar Sep 18 '22 17:09 ChaseDDevelopment

@ChaseDDevelopment --vide-stride 1 (default) uses every video frame, 2 uses every other frame etc.

There are no new settings needed to train or predict, but the ClearML integration is rather new so it may have an issue. Could you verify the bug is reproducible with a common model (i.e. yolov5s.pt) and common dataset (i.e. COCO128.yaml or VOC.yaml?) and provide exact steps to help us reproduce on our end?

@thepycoder could you take a look at this please?

EDIT: BTW test key should be optional and is not used by default during training even if present. The only way to to use your test set is to run python val.py --task test, default is --task val

glenn-jocher avatar Sep 18 '22 17:09 glenn-jocher

Yes I'll work on that right away. An interesting thing I just noticed, is that my "local" dataset, has 2598 images and 2598 labels in the train folder. and when a model is trained with that, it works nicely. (This is downloaded directly from roboflow)

The one that gets downloaded from my clearML instance has 4843 images, and 2245 labels. So there is definitely something fishy going on with how it uploads or downloads it, or what ClearML does with the data.

I am going to try their CLI command clearml-data upload rather than sync, and see if it still produces the same results on my dataset. I'll also try the coco128.yaml to see if its reproducable with the Yolov5 common stuff.

ChaseDDevelopment avatar Sep 18 '22 17:09 ChaseDDevelopment

@glenn-jocher Okay so,

With my custom dataset:

When using the clearml-data sync --project YOLOv5 --name {NAME} --folder . command, it provided the results above/below:

4843 images, and 2245 labels in the train folder. - ClearML 2598 images and 2598 labels in the train folder - Direct Roboflow download

The dataset shown in ClearML says it has ~8K Files

when using the following command to get my dataset into ClearML the following results occur:

clearml-data create --name {NAME} --project YOLOv5 clearml-data add --files . clearml-data close

2598 images and 2598 labels in the train folder -- ClearML 2598 images and 2598 labels in the train folder -- Direct Roboflow download.

The dataset shown in ClearML says it has ~6K Files - This matches my local dataset.

For both of these commands run, my directory structure looked like the following:

├── datasets
│   └── {NAME}
│       ├── data.yaml
│       ├── test
│       │   ├── images
│       │   └── labels
│       ├── train
│       │   ├── images
│       │   └── labels
│       └── valid
│           ├── images
│           └── labels

And the data.yaml looked like the following:

train: ./train/images
val: ./valid/images
test: ./train/images

nc: 4
names: ['{name}', '{name}', '{name}', '{name}']

With the Coco128.yaml dataset:

When using the clearml-data sync --project YOLOv5 --name coco --folder . command, it provided the results above/below:

128 images in folder, 128 labels in folder

when using the following command to get the COCO128 dataset into ClearML the following results occur:

clearml-data create --name coco--project YOLOv5 clearml-data add --files . clearml-data close

128 images in folder, 128 labels in folder

For both these commands the directory structure was as follows:

├── coco
│   ├── coco128.yaml
│   ├── images
│   ├── labels
│   ├── LICENSE
│   └── README.txt

And the data.yaml looked like the following:

# YOLOv5 🚀 by Ultralytics, GPL-3.0 license
# COCO128 dataset https://www.kaggle.com/ultralytics/coco128 (first 128 images from COCO train2017) by Ultralytics
# Example usage: python train.py --data coco128.yaml
# parent
# ├── yolov5
# └── datasets
#     └── coco128  ← downloads here (7 MB)


# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]
path: ../datasets/coco128  # dataset root dir
train: images/train2017  # train images (relative to 'path') 128 images
val: images/train2017  # val images (relative to 'path') 128 images
test:  # test images (optional)

# Classes
names:
  0: person
  1: bicycle
  2: car
  3: motorcycle
  4: airplane
  5: bus
  6: train
  7: truck
  8: boat
  9: traffic light
  10: fire hydrant
  11: stop sign
  12: parking meter
  13: bench
  14: bird
  15: cat
  16: dog
  17: horse
  18: sheep
  19: cow
  20: elephant
  21: bear
  22: zebra
  23: giraffe
  24: backpack
  25: umbrella
  26: handbag
  27: tie
  28: suitcase
  29: frisbee
  30: skis
  31: snowboard
  32: sports ball
  33: kite
  34: baseball bat
  35: baseball glove
  36: skateboard
  37: surfboard
  38: tennis racket
  39: bottle
  40: wine glass
  41: cup
  42: fork
  43: knife
  44: spoon
  45: bowl
  46: banana
  47: apple
  48: sandwich
  49: orange
  50: broccoli
  51: carrot
  52: hot dog
  53: pizza
  54: donut
  55: cake
  56: chair
  57: couch
  58: potted plant
  59: bed
  60: dining table
  61: toilet
  62: tv
  63: laptop
  64: mouse
  65: remote
  66: keyboard
  67: cell phone
  68: microwave
  69: oven
  70: toaster
  71: sink
  72: refrigerator
  73: book
  74: clock
  75: vase
  76: scissors
  77: teddy bear
  78: hair drier
  79: toothbrush


# Download script/URL (optional)
download: https://ultralytics.com/assets/coco128.zip

So it seems like it was an error with how ClearML "Sync'd" my custom dataset to clearml, or how my dataset was structured from Roboflow

The coco dataset seems to work as expected with both sets of commands.

I was able to train a model from ClearML with my custom dataset when not using sync and it works as expected now.

ChaseDDevelopment avatar Sep 18 '22 20:09 ChaseDDevelopment

Hi @ChaseDDevelopment, thanks a ton for the detailed write-up and follow-up work!

Glad to hear you got to move forward using the normal dataset create commands, but I'd still want to get to the bottom of this, because it does not sound like normal behaviour :)

Sync will create a datasets that's based on the previous one with the same name (the idea being you have a single source of truth: the local folder). Could it be that a previous version, which might have been incomplete, was the issue here? We should be able to check by syncing using a new, non existing, project name and checking if the result is still the same.

In the meantime, I'm making a testcase for this to see if I can reproduce your issue. I'll update here if that's ok for you.

Thanks again!

thepycoder avatar Sep 19 '22 13:09 thepycoder

Update: I can't reproduce the issue, using this code:

import os
import uuid

def generate_dataset():
    os.makedirs('datasets', exist_ok=True)
    os.makedirs('datasets/ChaseDDevelopment', exist_ok=True)
    for subset in ['test', 'train', 'valid']:
        for subfolder in ['images', 'labels']:
            folder = f'datasets/ChaseDDevelopment/{subset}/{subfolder}'
            os.makedirs(folder, exist_ok=True)
            for i in range(10):
                with open(f'{folder}/{i}-{uuid.uuid4()}.txt', 'w') as f:
                    f.writelines(['Hello World!', str(uuid.uuid4())])

if __name__ == '__main__':
    generate_dataset()

I can generate your dataset structure, immediately calling clearml-data sync --project ChaseDDevelopment1 --name ChaseDDevelopment --folder . worked like I would expect. Can you maybe share a screenshot of your dataset history (if you're allowed to share it of course!)

thepycoder avatar Sep 19 '22 13:09 thepycoder

I can generate your dataset structure, immediately calling clearml-data sync --project ChaseDDevelopment1 --name ChaseDDevelopment --folder . worked like I would expect. Can you maybe share a screenshot of your dataset history (if you're allowed to share it of course!)

Unfortunately I can't share my dataset history.

It was a fresh curl of the dataset from Yolov5, that was trained with 2 weeks ago on the version of Yolo I had that was 1000 commits out of date. I had deleted it to re-download it and pull it with the curl command. If you can't reproduce it, It's not a major issue then, and was probably an issue somewhere on my end.

I do appreciate the time spent on this though, so thank you.

I think this can probably be closed unless there is anything else I could provide other than my dataset history?

ChaseDDevelopment avatar Sep 19 '22 15:09 ChaseDDevelopment

@ChaseDDevelopment

No worries! If you don't have the datasets in ClearML anymore, there doesn't seem to be much more we can do for now. Thanks for putting in the time and effort though, we'll keep our eyes open for weird sync behaviour.

Feel free to ping me directly in an issue or so, if you find something else not working or behaving strangely with ClearML in the future, we're always looking to make it better for everyone :D

@glenn-jocher I think this issue can indeed be closed.

thepycoder avatar Sep 20 '22 07:09 thepycoder

👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv5 🚀 resources:

  • Wiki – https://github.com/ultralytics/yolov5/wiki
  • Tutorials – https://docs.ultralytics.com/yolov5
  • Docs – https://docs.ultralytics.com

Access additional Ultralytics ⚡ resources:

  • Ultralytics HUB – https://ultralytics.com/hub
  • Vision API – https://ultralytics.com/yolov5
  • About Us – https://ultralytics.com/about
  • Join Our Team – https://ultralytics.com/work
  • Contact Us – https://ultralytics.com/contact

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐!

github-actions[bot] avatar Oct 21 '22 00:10 github-actions[bot]

Thanks, @thepycoder! Your feedback and collaboration are what help drive our community and projects forward. We'll continue to keep an eye on this and appreciate you bringing it to our attention. Feel free to reach out if you encounter anything else in the future. 👍

Closing the issue now.

glenn-jocher avatar Nov 15 '23 15:11 glenn-jocher