yolov5
yolov5 copied to clipboard
Training on two consecutive images in the context of spatio-temporal learning
Search before asking
- [X] I have searched the YOLOv5 issues and discussions and found no similar questions.
Question
Hi! I am trying to train a model with two consecutive images (frames) rather than one, i.e. a tensor of size WxHx6 rather than WxHx3, and use the label file of the most recent frame. I learned from previous issues that while I can create as many channels as I want in the model yaml, the dataloaders are constricted to 3 channels. I am not sure where to begin with modifying the dataloaders, do you perhaps have some tips or are able to point me in the right direction?
Thank you!
Additional
No response
👋 Hello @kristinatel, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.
If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.
If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.
For business inquiries or professional support requests please visit https://ultralytics.com or email [email protected].
Requirements
Python>=3.7.0 with all requirements.txt installed including PyTorch>=1.7. To get started:
git clone https://github.com/ultralytics/yolov5 # clone
cd yolov5
pip install -r requirements.txt # install
Environments
YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):
-
Google Colab and Kaggle notebooks with free GPU:
- Google Cloud Deep Learning VM. See GCP Quickstart Guide
- Amazon Deep Learning AMI. See AWS Quickstart Guide
-
Docker Image. See Docker Quickstart Guide
Status
If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), validation (val.py), inference (detect.py) and export (export.py) on macOS, Windows, and Ubuntu every 24 hours and on every commit.
You don't need to touch dataloaders
. Just create a simple Python script that iterates over your images and creates 6-channel tiff
files out of each pair of images. I suppose that shouldn't require changing YOLO and should work just fine.
EDIT: This idea doesn't work. I changed the dataloader to make it work ...
I'm trying to do something similar. Working with more than 3 channels is something of a mess apparently. It's not just the dataloader, the 3 channel limit is sprinkled throughout the code.
I've gotten as far as allowing training on >3 channels, but I can't currently get inferencing working. The hubconf.py custom loader doesn't like custom input configurations.
Have you seen #1739? This comment claims to resolve the issue.
I also have a project which needs multi-channel input. I will test it soon.
I pulled down a fresh copy of the repo and moved in my custom dataloader. Adding "ch: 5" to the .yaml file did allow me to train 5 channel images pretty easily. However, later running inference with a trained model is a problem.
One change I've had to make is here. https://github.com/ultralytics/yolov5/blob/e83b422a69bbd69628687b2dc50102c08877505c/models/common.py#L618
The code forces my 5 channel input into 3 channels, which errors out because the model is expecting a 5 channel input.
Aha, I see it. Did you manage to solve the problem by removing this line or changing it somehow?
In my case, I can change it to this and make it work, but I think there's a bigger problem that I'm trying to pinpoint. I don't think the model output is the correct size, which is affecting the bounding boxes it returns and the confidence scores.
This needs to bring in the number of channels into the function, but I don't have time at the moment to suggest the best way to do that.
im = im[..., :5] if im.ndim == 3 else np.tile(im[..., None], 3)
Ok, I'm looking at this line. https://github.com/ultralytics/yolov5/blob/e83b422a69bbd69628687b2dc50102c08877505c/models/common.py#L632
For my task, I'm passing the system numpy arrays of [960,960,5] and 12 classes. I printed the shape of y as it comes from the model on line 632. It's returning an array of [1,25500,17], which I think could only be from a [640,640,3] input (but I could be mistaken).
I'm getting the size information from https://github.com/ultralytics/yolov5/issues/8554
I'm trying to do something similar. Working with more than 3 channels is something of a mess apparently. It's not just the dataloader, the 3 channel limit is sprinkled throughout the code.
I've gotten as far as allowing training on >3 channels, but I can't currently get inferencing working. The hubconf.py custom loader doesn't like custom input configurations.
Are you sure that is training in the number of bands/channels that you want? if you check LoadImagesAndLabels
you will see that load_image
is using cv2.imread()
for reading the images, therefore it may be ignoring the additional channels. You need to modify the function to make it work properly. Im actually surprise that cv2 is not complaining with the images. which format are you passing?
I wrote a custom data loader that creates [960,960,5] arrays. It replaces LoadImagesAndLabels
.
I have changed my __getitem__
function in LoadImagesAndLabels
to return a [7,512,512]
vector. Also, I have set ch:7
in yolov5s.yaml
file. However, I get 0 mAP when I do validation. Is there anything I should do to make this work?
@mullenba: I set a print statement at this line at common.py
but it seems that it never reaches this point.
@glenn-jocher Do you have any idea? I've been trying this thing for days and still not a single clue how to train this model successfully. Is there any tips for training models with more than one channel?
I don't know if 3 channel limit is sprinkled through the code or not ...
@pourmand1376 Does it say it's training but you get 0 mAP after when you test? Did you check that your model is using all of the channels? Here's another point forcing 3 channels: https://github.com/ultralytics/yolov5/blob/f0e5a608f50ac647827bede88fded7908c7edeab/train.py#L130
@mullenba. Yes. The model trains completely however It never reaches something acceptable. I have als had some training with mAP of 0.0001 if that helps.
Also, If checked out the code that you sent, It is actually ignored later DetectionModel
in:
https://github.com/ultralytics/yolov5/blob/f0e5a608f50ac647827bede88fded7908c7edeab/models/yolo.py#L162
I didn't check that the model is using all channels. How should I check that?
For the record, this is my model's summary:
08:47:32 |base|pourmand@user01 yolov5 ±|test_dataloader ✗|→ python models/yolo.py --cfg models/yolov5s.yaml --batch 10 --device 0 --profile
models/yolo: cfg=models/yolov5s.yaml, batch_size=10, device=0, profile=True, line_profile=False, test=False
YOLOv5 🚀 v6.2-163-gf19d0634 Python-3.9.12 torch-1.12.1+cu102 CUDA:0 (Quadro RTX 8000, 48601MiB)
from n params module arguments
0 -1 1 8128 models.common.Conv [7, 32, 6, 2, 2]
1 -1 1 18560 models.common.Conv [32, 64, 3, 2]
2 -1 1 18816 models.common.C3 [64, 64, 1]
3 -1 1 73984 models.common.Conv [64, 128, 3, 2]
4 -1 2 115712 models.common.C3 [128, 128, 2]
5 -1 1 295424 models.common.Conv [128, 256, 3, 2]
6 -1 3 625152 models.common.C3 [256, 256, 3]
7 -1 1 1180672 models.common.Conv [256, 512, 3, 2]
8 -1 1 1182720 models.common.C3 [512, 512, 1]
9 -1 1 656896 models.common.SPPF [512, 512, 5]
10 -1 1 131584 models.common.Conv [512, 256, 1, 1]
11 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
12 [-1, 6] 1 0 models.common.Concat [1]
13 -1 1 361984 models.common.C3 [512, 256, 1, False]
14 -1 1 33024 models.common.Conv [256, 128, 1, 1]
15 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
16 [-1, 4] 1 0 models.common.Concat [1]
17 -1 1 90880 models.common.C3 [256, 128, 1, False]
18 -1 1 147712 models.common.Conv [128, 128, 3, 2]
19 [-1, 14] 1 0 models.common.Concat [1]
20 -1 1 296448 models.common.C3 [256, 256, 1, False]
21 -1 1 590336 models.common.Conv [256, 256, 3, 2]
22 [-1, 10] 1 0 models.common.Concat [1]
23 -1 1 1182720 models.common.C3 [512, 512, 1, False]
24 [17, 20, 23] 1 229245 Detect [80, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]
YOLOv5s summary: 270 layers, 7239997 parameters, 7239997 gradients, 17.6 GFLOPs
This is also original yolov5s.yaml summary:
models/yolo: cfg=models/yolov5s.yaml, batch_size=10, device=, profile=True, line_profile=False, test=False
YOLOv5 🚀 v6.2-53-gf0e5a60 Python-3.7.13 torch-1.12.1+cu113 CUDA:0 (Tesla T4, 15110MiB)
from n params module arguments
0 -1 1 3520 models.common.Conv [3, 32, 6, 2, 2]
1 -1 1 18560 models.common.Conv [32, 64, 3, 2]
2 -1 1 18816 models.common.C3 [64, 64, 1]
3 -1 1 73984 models.common.Conv [64, 128, 3, 2]
4 -1 2 115712 models.common.C3 [128, 128, 2]
5 -1 1 295424 models.common.Conv [128, 256, 3, 2]
6 -1 3 625152 models.common.C3 [256, 256, 3]
7 -1 1 1180672 models.common.Conv [256, 512, 3, 2]
8 -1 1 1182720 models.common.C3 [512, 512, 1]
9 -1 1 656896 models.common.SPPF [512, 512, 5]
10 -1 1 131584 models.common.Conv [512, 256, 1, 1]
11 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
12 [-1, 6] 1 0 models.common.Concat [1]
13 -1 1 361984 models.common.C3 [512, 256, 1, False]
14 -1 1 33024 models.common.Conv [256, 128, 1, 1]
15 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
16 [-1, 4] 1 0 models.common.Concat [1]
17 -1 1 90880 models.common.C3 [256, 128, 1, False]
18 -1 1 147712 models.common.Conv [128, 128, 3, 2]
19 [-1, 14] 1 0 models.common.Concat [1]
20 -1 1 296448 models.common.C3 [256, 256, 1, False]
21 -1 1 590336 models.common.Conv [256, 256, 3, 2]
22 [-1, 10] 1 0 models.common.Concat [1]
23 -1 1 1182720 models.common.C3 [512, 512, 1, False]
24 [17, 20, 23] 1 229245 Detect [80, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]
YOLOv5s summary: 270 layers, 7235389 parameters, 7235389 gradients, 16.6 GFLOPs
From above, it is clear that model architecture is not problematic. However there is something wrong when training this beast ...
Ok, when you're running the model for validation, how are you initializing it?
For example, here's mine.
model = torch.hub.load('yolov5', # Folder with my customized repo
'custom', # This affects A LOT
path=YOLO_WEIGHTS, # Path to saved weights
source='local',
autoshape=True,
channels=5,
classes=12).to("cuda:0")
Actually I haven't changed the code here. I am using standard syntax to load custom yaml model (Reference).
Something like this:
python train.py \
--img-size 512 \
--weights /mnt/new_ssd/projects/Anevrism/Models/pourmand/yolov5/runs/train/exp141/weights/last.pt \
--data /mnt/new_ssd/projects/Anevrism/Data/brain_cta/output_folder/database.yaml \
--hyp data/hyps/hyp.aneurisym.yaml \
--epochs 200 --batch-size $(batch) --device 0 --save-period 5 --workers 2 \
--cfg models/yolov5s.yaml \
You're taking a different approach to running the model than I am, so I'm not quite sure what the issue could be.
Have you checked to see if it's getting the classifications correct? There could be a situation (like I'm currently dealing with) where it's detecting the classes in the image, but the bounding boxes aren't very good.
That's right. Mine is the opposite. My model detects bboxes very good but it doesn't detect classes.
@pourmand1376 I made 6 channel tiff files out of my image pairs, added ch: 6
in the model yaml, and upon training I get,
>it/s]imdecode_(''): can't read header: OpenCV(4.6.0) /io/opencv/modules/imgcodecs/src/grfmt_tiff.cpp:152: error:
>(-2:Unspecified error) in function 'int cv::TiffDecoder::normalizeChannelsNumber(int) const'
> Unsupported number of channels:
> 'channels >= 1 && channels <= 4'
> where
> 'channels' is 6
So as @Camilochiang said, opencv does complain. Is there a work around for this?
@mullenba Could you tell us how you modified your dataloader and made the training work? I only wish to train with image pairs of 6-channels but run inference regularly on 3-channel images.
Tiff method doesn't work. I changed the dataloader to return a [channel_count, ...] vector.
You can basically do that by renaming the __getitem__
method to getitem_pre
and write your own __getitem__
which returns your vector.
class DataLoader():
def getitem_pre(self, index):
yolo original get_item method.
def __getitem__(self, index):
previous_image=getitem_pre(index-1)
current_image = getitem_pre(index)
after_image = getitem_pre(index+1)
# then combine previous image, current image and after image however you want
return it
@kristinatel I created a custom dataset similar to this.
I then changed the utils.dataloaders file to call my new dataset here, where instead of calling LoadImagesAndLabels, I call my own dataset. https://github.com/ultralytics/yolov5/blob/1aea74cddbc78e7f79dac07090cb157dfc24dbcc/utils/dataloaders.py#L122
👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.
Access additional YOLOv5 🚀 resources:
- Wiki – https://github.com/ultralytics/yolov5/wiki
- Tutorials – https://docs.ultralytics.com/yolov5
- Docs – https://docs.ultralytics.com
Access additional Ultralytics ⚡ resources:
- Ultralytics HUB – https://ultralytics.com/hub
- Vision API – https://ultralytics.com/yolov5
- About Us – https://ultralytics.com/about
- Join Our Team – https://ultralytics.com/work
- Contact Us – https://ultralytics.com/contact
Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!
Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐!
The number of my data channels is 8. I replaced the dataset in yolov5 with a self-made dataset, and modified a few places where the error was reported because the channel was solidified to 3, but the accuracy was very low during training. After 10 epochs, the accuracy was 0, I would like to ask how you modified the data preprocessing steps.
@xiaoche-24 I suppose the preprocessing steps will depend on the nature of your dataset. My use case was that I was trying to concatenate every 2 consecutive images in my dataset and use the annotations of the second image only. First I added the ch: 6
argument in my model yaml.
I changed the default def __getitem__
in dataloaders.py to def getitem_pre
, and created a new def __getitem__
as follows:
def __getitem__(self, index):
first_frame, _, first_frame_name, _ = LoadImagesAndLabels.getitem_pre(self, 2 * index)
second_frame, labels_out, second_frame_name, shapes = LoadImagesAndLabels.getitem_pre(self, 2 * index + 1)
img_pair = torch.cat((first_frame, second_frame), 0)
return img_pair, labels_out, second_frame_name, shapes
After changing the def __len__
of the LoadImagesAndLabels class to be half as follows as each loop will essentially load 2 of my images at a time:
def __len__(self):
return len(self.im_files) // 2
Since the order of my images matters I also made sure to sort my self.im_files numerically before they are loaded. And finally in train.py I set augment and shuffle to false, as well as rect to false in the val_loader. By running train.py my model was able to learn normally.
The number of my data channels is 8. I replaced the dataset in yolov5 with a self-made dataset, and modified a few places where the error was reported because the channel was solidified to 3, but the accuracy was very low during training. After 10 epochs, the accuracy was 0, I would like to ask how you modified the data preprocessing steps.
Basically, I took two images and stacked them together into a (5,960,960) array (I'm working with 5 channels). The big issue is that the default dataloader isn't terribly well documented and isn't clear when it's changing array format (sometimes it bounces between (channels, height, width) and (height, width, channels). I found that if you aren't careful, you can sometimes pass in an array with the wrong shape and it will still run.
Also, if you had to make any changes to the portion of the dataloader that pulls labels and bounding boxes, make sure you have that in the right format too. The code bounces between (x1,y1,x2,y2) and (x,y,w,h) in different places, so if you have the wrong format you'll end up with the zero accuracy issue too.
@kristinatel I successfully ran through the 6-channel training code based on your method, but an error was reported during inference. How can I modify the detect.py?
@kristinatel I successfully ran through the 6-channel training code based on your method, but an error was reported during inference. How can I modify the detect.py?
To run inference you will need to also modify class LoadImages
in dataloaders.py, where images are read one by one. It needs to be changed to read two images at once;
else:
# Read image
#frame1
self.count += 1
im01 = cv2.imread(path) # BGR
...
#frame2
self.count += 1
im02 = cv2.imread(path) # BGR
...
.
.
.
else:
im1 = letterbox(im01, self.img_size, stride=self.stride, auto=self.auto)[0] # padded resize
im1 = im.transpose((2, 0, 1))[::-1] # HWC to CHW, BGR to RGB
im1 = np.ascontiguousarray(im) # contiguous
# Do the same for im2
im = np.concatenate((im1, im2), axis=0)
return path, im, im02, self.cap, s2
#returns concatenated im but im2 filename as predictions will be made on im2
That is if you will give detect.py 2 images at a time, if you will give more images make sure to sort the images before they are loaded.
Regarding detect.py, I believe it will give an error at model.warmup(imgsz=(..., 3,...))
, so you can either change the channels to 6 or skip the warmup entirely.
Hope this helps.
I wrote a custom data loader that creates [960,960,5] arrays. It replaces
LoadImagesAndLabels
.
May I ask you what method you used to generate this array, and how this 5-channel array and images and labels were called during training respectively?
The number of my data channels is 8. I replaced the dataset in yolov5 with a self-made dataset, and modified a few places where the error was reported because the channel was solidified to 3, but the accuracy was very low during training. After 10 epochs, the accuracy was 0, I would like to ask how you modified the data preprocessing steps.
The number of my data channels is 8. I replaced the dataset in yolov5 with a self-made dataset, and modified a few places where the error was reported because the channel was solidified to 3, but the accuracy was very low during training. After 10 epochs, the accuracy was 0, I would like to ask how you modified the data preprocessing steps.
你的数据集是什么格式的呀?
@wq247726404 嗨!首先感謝你的詳細描述。YOLOv5 加載和處理圖像和標籤的方式可能會因不同的數據集結構而有所不同。我之前的求解方法是將兩個圖像堆疊成一個(5,960,960)的數組,並在模型中對應修改了加載器的函數,這裡我使用了自製的數據集,並在訓練過程中分別調用了圖像和標籤。我們官方文件中有一些關於數據集格式的描述,你可以參考一下 https://docs.ultralytics.com/yolov5/training/data/。希望對你有所幫助!