yolov5 Training on two consecutive images in the context of spatio-temporal learning

Training on two consecutive images in the context of spatio-temporal learning

Open kristinatel opened this issue 1 year ago • 1 comments

Search before asking

[X] I have searched the YOLOv5 issues and discussions and found no similar questions.

Question

Hi! I am trying to train a model with two consecutive images (frames) rather than one, i.e. a tensor of size WxHx6 rather than WxHx3, and use the label file of the most recent frame. I learned from previous issues that while I can create as many channels as I want in the model yaml, the dataloaders are constricted to 3 channels. I am not sure where to begin with modifying the dataloaders, do you perhaps have some tips or are able to point me in the right direction?

Thank you!

Additional

No response

Aug 10 '22 09:08 kristinatel

👋 Hello @kristinatel, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://ultralytics.com or email [email protected].

Requirements

Python>=3.7.0 with all requirements.txt installed including PyTorch>=1.7. To get started:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Google Colab and Kaggle notebooks with free GPU:
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Amazon Deep Learning AMI. See AWS Quickstart Guide
Docker Image. See Docker Quickstart Guide

Status

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), validation (val.py), inference (detect.py) and export (export.py) on macOS, Windows, and Ubuntu every 24 hours and on every commit.

Aug 10 '22 09:08 github-actions[bot]

You don't need to touch dataloaders. Just create a simple Python script that iterates over your images and creates 6-channel tiff files out of each pair of images. I suppose that shouldn't require changing YOLO and should work just fine.

EDIT: This idea doesn't work. I changed the dataloader to make it work ...

Aug 10 '22 19:08 pourmand1376

I'm trying to do something similar. Working with more than 3 channels is something of a mess apparently. It's not just the dataloader, the 3 channel limit is sprinkled throughout the code.

I've gotten as far as allowing training on >3 channels, but I can't currently get inferencing working. The hubconf.py custom loader doesn't like custom input configurations.

Aug 16 '22 16:08 mullenba

Have you seen #1739? This comment claims to resolve the issue.

I also have a project which needs multi-channel input. I will test it soon.

Aug 17 '22 06:08 pourmand1376

I pulled down a fresh copy of the repo and moved in my custom dataloader. Adding "ch: 5" to the .yaml file did allow me to train 5 channel images pretty easily. However, later running inference with a trained model is a problem.

One change I've had to make is here. https://github.com/ultralytics/yolov5/blob/e83b422a69bbd69628687b2dc50102c08877505c/models/common.py#L618

The code forces my 5 channel input into 3 channels, which errors out because the model is expecting a 5 channel input.

Aug 17 '22 19:08 mullenba

Aha, I see it. Did you manage to solve the problem by removing this line or changing it somehow?

Aug 17 '22 19:08 pourmand1376

In my case, I can change it to this and make it work, but I think there's a bigger problem that I'm trying to pinpoint. I don't think the model output is the correct size, which is affecting the bounding boxes it returns and the confidence scores.

This needs to bring in the number of channels into the function, but I don't have time at the moment to suggest the best way to do that. im = im[..., :5] if im.ndim == 3 else np.tile(im[..., None], 3)

Aug 17 '22 19:08 mullenba

Ok, I'm looking at this line. https://github.com/ultralytics/yolov5/blob/e83b422a69bbd69628687b2dc50102c08877505c/models/common.py#L632

For my task, I'm passing the system numpy arrays of [960,960,5] and 12 classes. I printed the shape of y as it comes from the model on line 632. It's returning an array of [1,25500,17], which I think could only be from a [640,640,3] input (but I could be mistaken).

I'm getting the size information from https://github.com/ultralytics/yolov5/issues/8554

Aug 17 '22 20:08 mullenba

I'm trying to do something similar. Working with more than 3 channels is something of a mess apparently. It's not just the dataloader, the 3 channel limit is sprinkled throughout the code.

I've gotten as far as allowing training on >3 channels, but I can't currently get inferencing working. The hubconf.py custom loader doesn't like custom input configurations.

Are you sure that is training in the number of bands/channels that you want? if you check LoadImagesAndLabels you will see that load_image is using cv2.imread() for reading the images, therefore it may be ignoring the additional channels. You need to modify the function to make it work properly. Im actually surprise that cv2 is not complaining with the images. which format are you passing?

Aug 19 '22 14:08 Camilochiang

I wrote a custom data loader that creates [960,960,5] arrays. It replaces LoadImagesAndLabels.

Aug 19 '22 14:08 mullenba

I have changed my __getitem__ function in LoadImagesAndLabels to return a [7,512,512] vector. Also, I have set ch:7 in yolov5s.yaml file. However, I get 0 mAP when I do validation. Is there anything I should do to make this work?

@mullenba: I set a print statement at this line at common.py but it seems that it never reaches this point.

Aug 22 '22 08:08 pourmand1376

@glenn-jocher Do you have any idea? I've been trying this thing for days and still not a single clue how to train this model successfully. Is there any tips for training models with more than one channel?

I don't know if 3 channel limit is sprinkled through the code or not ...

Aug 24 '22 08:08 pourmand1376

@pourmand1376 Does it say it's training but you get 0 mAP after when you test? Did you check that your model is using all of the channels? Here's another point forcing 3 channels: https://github.com/ultralytics/yolov5/blob/f0e5a608f50ac647827bede88fded7908c7edeab/train.py#L130

Aug 24 '22 11:08 mullenba

@mullenba. Yes. The model trains completely however It never reaches something acceptable. I have als had some training with mAP of 0.0001 if that helps.

Also, If checked out the code that you sent, It is actually ignored later DetectionModel in: https://github.com/ultralytics/yolov5/blob/f0e5a608f50ac647827bede88fded7908c7edeab/models/yolo.py#L162

I didn't check that the model is using all channels. How should I check that?

For the record, this is my model's summary:

08:47:32  |base|pourmand@user01 yolov5 ±|test_dataloader ✗|→ python models/yolo.py --cfg models/yolov5s.yaml --batch 10 --device 0 --profile
models/yolo: cfg=models/yolov5s.yaml, batch_size=10, device=0, profile=True, line_profile=False, test=False
YOLOv5 🚀 v6.2-163-gf19d0634 Python-3.9.12 torch-1.12.1+cu102 CUDA:0 (Quadro RTX 8000, 48601MiB)


                 from  n    params  module                                  arguments
  0                -1  1      8128  models.common.Conv                      [7, 32, 6, 2, 2]
  1                -1  1     18560  models.common.Conv                      [32, 64, 3, 2]
  2                -1  1     18816  models.common.C3                        [64, 64, 1]
  3                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]
  4                -1  2    115712  models.common.C3                        [128, 128, 2]
  5                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]
  6                -1  3    625152  models.common.C3                        [256, 256, 3]
  7                -1  1   1180672  models.common.Conv                      [256, 512, 3, 2]
  8                -1  1   1182720  models.common.C3                        [512, 512, 1]
  9                -1  1    656896  models.common.SPPF                      [512, 512, 5]
 10                -1  1    131584  models.common.Conv                      [512, 256, 1, 1]
 11                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']
 12           [-1, 6]  1         0  models.common.Concat                    [1]
 13                -1  1    361984  models.common.C3                        [512, 256, 1, False]
 14                -1  1     33024  models.common.Conv                      [256, 128, 1, 1]
 15                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']
 16           [-1, 4]  1         0  models.common.Concat                    [1]
 17                -1  1     90880  models.common.C3                        [256, 128, 1, False]
 18                -1  1    147712  models.common.Conv                      [128, 128, 3, 2]
 19          [-1, 14]  1         0  models.common.Concat                    [1]
 20                -1  1    296448  models.common.C3                        [256, 256, 1, False]
 21                -1  1    590336  models.common.Conv                      [256, 256, 3, 2]
 22          [-1, 10]  1         0  models.common.Concat                    [1]
 23                -1  1   1182720  models.common.C3                        [512, 512, 1, False]
 24      [17, 20, 23]  1    229245  Detect                                  [80, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]
YOLOv5s summary: 270 layers, 7239997 parameters, 7239997 gradients, 17.6 GFLOPs

This is also original yolov5s.yaml summary:

models/yolo: cfg=models/yolov5s.yaml, batch_size=10, device=, profile=True, line_profile=False, test=False
YOLOv5 🚀 v6.2-53-gf0e5a60 Python-3.7.13 torch-1.12.1+cu113 CUDA:0 (Tesla T4, 15110MiB)


                 from  n    params  module                                  arguments                     
  0                -1  1      3520  models.common.Conv                      [3, 32, 6, 2, 2]              
  1                -1  1     18560  models.common.Conv                      [32, 64, 3, 2]                
  2                -1  1     18816  models.common.C3                        [64, 64, 1]                   
  3                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]               
  4                -1  2    115712  models.common.C3                        [128, 128, 2]                 
  5                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]              
  6                -1  3    625152  models.common.C3                        [256, 256, 3]                 
  7                -1  1   1180672  models.common.Conv                      [256, 512, 3, 2]              
  8                -1  1   1182720  models.common.C3                        [512, 512, 1]                 
  9                -1  1    656896  models.common.SPPF                      [512, 512, 5]                 
 10                -1  1    131584  models.common.Conv                      [512, 256, 1, 1]              
 11                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 12           [-1, 6]  1         0  models.common.Concat                    [1]                           
 13                -1  1    361984  models.common.C3                        [512, 256, 1, False]          
 14                -1  1     33024  models.common.Conv                      [256, 128, 1, 1]              
 15                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 16           [-1, 4]  1         0  models.common.Concat                    [1]                           
 17                -1  1     90880  models.common.C3                        [256, 128, 1, False]          
 18                -1  1    147712  models.common.Conv                      [128, 128, 3, 2]              
 19          [-1, 14]  1         0  models.common.Concat                    [1]                           
 20                -1  1    296448  models.common.C3                        [256, 256, 1, False]          
 21                -1  1    590336  models.common.Conv                      [256, 256, 3, 2]              
 22          [-1, 10]  1         0  models.common.Concat                    [1]                           
 23                -1  1   1182720  models.common.C3                        [512, 512, 1, False]          
 24      [17, 20, 23]  1    229245  Detect                                  [80, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]
YOLOv5s summary: 270 layers, 7235389 parameters, 7235389 gradients, 16.6 GFLOPs

From above, it is clear that model architecture is not problematic. However there is something wrong when training this beast ...

Aug 24 '22 12:08 pourmand1376

Ok, when you're running the model for validation, how are you initializing it?

For example, here's mine.

model = torch.hub.load('yolov5',  # Folder with my customized repo
                           'custom',  # This affects A LOT
                           path=YOLO_WEIGHTS,  # Path to saved weights
                           source='local',
                           autoshape=True,
                           channels=5,
                           classes=12).to("cuda:0")

Aug 24 '22 12:08 mullenba

Actually I haven't changed the code here. I am using standard syntax to load custom yaml model (Reference).

Something like this:

	python train.py \
		--img-size 512 \
		--weights /mnt/new_ssd/projects/Anevrism/Models/pourmand/yolov5/runs/train/exp141/weights/last.pt \
		--data /mnt/new_ssd/projects/Anevrism/Data/brain_cta/output_folder/database.yaml \
		--hyp data/hyps/hyp.aneurisym.yaml \
		--epochs 200 --batch-size $(batch) --device 0 --save-period 5 --workers 2 \
		--cfg models/yolov5s.yaml \

Aug 24 '22 15:08 pourmand1376

You're taking a different approach to running the model than I am, so I'm not quite sure what the issue could be.

Have you checked to see if it's getting the classifications correct? There could be a situation (like I'm currently dealing with) where it's detecting the classes in the image, but the bounding boxes aren't very good.

Aug 24 '22 16:08 mullenba

That's right. Mine is the opposite. My model detects bboxes very good but it doesn't detect classes.

Aug 25 '22 10:08 pourmand1376

@pourmand1376 I made 6 channel tiff files out of my image pairs, added ch: 6 in the model yaml, and upon training I get,

>it/s]imdecode_(''): can't read header: OpenCV(4.6.0) /io/opencv/modules/imgcodecs/src/grfmt_tiff.cpp:152: error:
>(-2:Unspecified error) in function 'int cv::TiffDecoder::normalizeChannelsNumber(int) const'

> Unsupported number of channels:
>     'channels >= 1 && channels <= 4'
> where
>     'channels' is 6

So as @Camilochiang said, opencv does complain. Is there a work around for this?

@mullenba Could you tell us how you modified your dataloader and made the training work? I only wish to train with image pairs of 6-channels but run inference regularly on 3-channel images.

Aug 31 '22 08:08 kristinatel

Tiff method doesn't work. I changed the dataloader to return a [channel_count, ...] vector.

You can basically do that by renaming the __getitem__ method to getitem_pre and write your own __getitem__ which returns your vector.

class DataLoader():

  def getitem_pre(self, index):
      yolo original get_item method. 
  
  def __getitem__(self, index):
      previous_image=getitem_pre(index-1)
      current_image = getitem_pre(index)
      after_image = getitem_pre(index+1)
      # then combine previous image, current image and after image however you want
      return it

Sep 01 '22 10:09 pourmand1376

@kristinatel I created a custom dataset similar to this.

I then changed the utils.dataloaders file to call my new dataset here, where instead of calling LoadImagesAndLabels, I call my own dataset. https://github.com/ultralytics/yolov5/blob/1aea74cddbc78e7f79dac07090cb157dfc24dbcc/utils/dataloaders.py#L122

Sep 05 '22 19:09 mullenba

👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv5 🚀 resources:

Wiki – https://github.com/ultralytics/yolov5/wiki
Tutorials – https://docs.ultralytics.com/yolov5
Docs – https://docs.ultralytics.com

Access additional Ultralytics ⚡ resources:

Ultralytics HUB – https://ultralytics.com/hub
Vision API – https://ultralytics.com/yolov5
About Us – https://ultralytics.com/about
Join Our Team – https://ultralytics.com/work
Contact Us – https://ultralytics.com/contact

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐!

Oct 06 '22 00:10 github-actions[bot]

The number of my data channels is 8. I replaced the dataset in yolov5 with a self-made dataset, and modified a few places where the error was reported because the channel was solidified to 3, but the accuracy was very low during training. After 10 epochs, the accuracy was 0, I would like to ask how you modified the data preprocessing steps.

Dec 07 '22 09:12 xiaoche-24

@xiaoche-24 I suppose the preprocessing steps will depend on the nature of your dataset. My use case was that I was trying to concatenate every 2 consecutive images in my dataset and use the annotations of the second image only. First I added the ch: 6 argument in my model yaml. I changed the default def __getitem__ in dataloaders.py to def getitem_pre, and created a new def __getitem__ as follows:

def __getitem__(self, index):
       first_frame, _, first_frame_name, _ = LoadImagesAndLabels.getitem_pre(self, 2 * index)
       second_frame, labels_out, second_frame_name, shapes = LoadImagesAndLabels.getitem_pre(self, 2 * index + 1)
       img_pair = torch.cat((first_frame, second_frame), 0)
       return img_pair, labels_out, second_frame_name, shapes

After changing the def __len__ of the LoadImagesAndLabels class to be half as follows as each loop will essentially load 2 of my images at a time:

def __len__(self):
    return len(self.im_files) // 2

Since the order of my images matters I also made sure to sort my self.im_files numerically before they are loaded. And finally in train.py I set augment and shuffle to false, as well as rect to false in the val_loader. By running train.py my model was able to learn normally.

Dec 07 '22 13:12 kristinatel

The number of my data channels is 8. I replaced the dataset in yolov5 with a self-made dataset, and modified a few places where the error was reported because the channel was solidified to 3, but the accuracy was very low during training. After 10 epochs, the accuracy was 0, I would like to ask how you modified the data preprocessing steps.

Basically, I took two images and stacked them together into a (5,960,960) array (I'm working with 5 channels). The big issue is that the default dataloader isn't terribly well documented and isn't clear when it's changing array format (sometimes it bounces between (channels, height, width) and (height, width, channels). I found that if you aren't careful, you can sometimes pass in an array with the wrong shape and it will still run.

Also, if you had to make any changes to the portion of the dataloader that pulls labels and bounding boxes, make sure you have that in the right format too. The code bounces between (x1,y1,x2,y2) and (x,y,w,h) in different places, so if you have the wrong format you'll end up with the zero accuracy issue too.

Dec 07 '22 13:12 mullenba

@kristinatel I successfully ran through the 6-channel training code based on your method, but an error was reported during inference. How can I modify the detect.py?

Mar 08 '23 02:03 xiaoche-24

@kristinatel I successfully ran through the 6-channel training code based on your method, but an error was reported during inference. How can I modify the detect.py?

To run inference you will need to also modify class LoadImages in dataloaders.py, where images are read one by one. It needs to be changed to read two images at once;

else:
            # Read image
            #frame1
            self.count += 1
            im01 = cv2.imread(path)  # BGR
            ...
            #frame2
            self.count += 1
            im02 = cv2.imread(path)  # BGR
            ...
.
.
.
else:
            im1 = letterbox(im01, self.img_size, stride=self.stride, auto=self.auto)[0]  # padded resize
            im1 = im.transpose((2, 0, 1))[::-1]  # HWC to CHW, BGR to RGB
            im1 = np.ascontiguousarray(im)  # contiguous
            # Do the same for im2
            im = np.concatenate((im1, im2), axis=0)
return path, im, im02, self.cap, s2    
#returns concatenated im but im2 filename as predictions will be made on im2

That is if you will give detect.py 2 images at a time, if you will give more images make sure to sort the images before they are loaded.

Regarding detect.py, I believe it will give an error at model.warmup(imgsz=(..., 3,...)) , so you can either change the channels to 6 or skip the warmup entirely. Hope this helps.

Mar 08 '23 13:03 kristinatel

I wrote a custom data loader that creates [960,960,5] arrays. It replaces LoadImagesAndLabels.

May I ask you what method you used to generate this array, and how this 5-channel array and images and labels were called during training respectively?

Oct 27 '23 04:10 wq247726404

The number of my data channels is 8. I replaced the dataset in yolov5 with a self-made dataset, and modified a few places where the error was reported because the channel was solidified to 3, but the accuracy was very low during training. After 10 epochs, the accuracy was 0, I would like to ask how you modified the data preprocessing steps.

你的数据集是什么格式的呀？

Oct 27 '23 07:10 wq247726404

@wq247726404 嗨！首先感謝你的詳細描述。YOLOv5 加載和處理圖像和標籤的方式可能會因不同的數據集結構而有所不同。我之前的求解方法是將兩個圖像堆疊成一個（5,960,960）的數組，並在模型中對應修改了加載器的函數，這裡我使用了自製的數據集，並在訓練過程中分別調用了圖像和標籤。我們官方文件中有一些關於數據集格式的描述，你可以參考一下 https://docs.ultralytics.com/yolov5/training/data/。希望對你有所幫助！

Nov 14 '23 17:11 glenn-jocher

yolov5 yolov5 copied to clipboard

Training on two consecutive images in the context of spatio-temporal learning

Search before asking

Question

Additional

Requirements

Environments

Status

yolov5
yolov5 copied to clipboard