YOLOP icon indicating copy to clipboard operation
YOLOP copied to clipboard

End-End Training

Open SikandAlex opened this issue 3 years ago • 6 comments

The default.py training configuration says (Line 102 - 112):

# if training 3 tasks end-to-end, set all parameters as True
# Alternating optimization
_C.TRAIN.SEG_ONLY = False           # Only train two segmentation branchs
_C.TRAIN.DET_ONLY = False           # Only train detection branch
_C.TRAIN.ENC_SEG_ONLY = False       # Only train encoder and two segmentation branchs
_C.TRAIN.ENC_DET_ONLY = False       # Only train encoder and detection branch

# Single task 
_C.TRAIN.DRIVABLE_ONLY = False      # Only train da_segmentation task
_C.TRAIN.LANE_ONLY = False          # Only train ll_segmentation task
_C.TRAIN.DET_ONLY = False          # Only train detection task

But the ReadMe says:

If you want try alternating optimization or train model for single task, please modify the corresponding configuration in ./lib/config/default.py to True. (As following, all configurations is False, which means training multiple tasks end to end).

# Alternating optimization
_C.TRAIN.SEG_ONLY = False           # Only train two segmentation branchs
_C.TRAIN.DET_ONLY = False           # Only train detection branch
_C.TRAIN.ENC_SEG_ONLY = False       # Only train encoder and two segmentation branchs
_C.TRAIN.ENC_DET_ONLY = False       # Only train encoder and detection branch

# Single task 
_C.TRAIN.DRIVABLE_ONLY = False      # Only train da_segmentation task
_C.TRAIN.LANE_ONLY = False          # Only train ll_segmentation task
_C.TRAIN.DET_ONLY = False          # Only train detection task

Therefore, the comments in the Python file say to set these parameters as True for End-End training, but the ReadMe instructs setting them as False for End-End training.

I am trying to reproduce your results using an RTX 2080Ti. I read your paper many times but did not see a mention of the training configuration you used for your experiment. Since the batch size was set to 24 and I seem to only be able to fit only 12 batch size at maximum with my 11GB of video memory.

Currently, I've been training for several days and the lane segmentation and driveable surface already look pretty good but the object detection head is still way too confident and overproducing predictions.

I ran out of disk space at several points over the course of this experiment and had to resume training from the latest checkpoint a couple times which is why there is gaps in the chart.

No Smoothing train_loss

Smoothing train_loss (1)

My parameters and dataset were identical to those in the default.py training configuration. I am wondering whether this slow convergence and oscillation in the loss is normal given the diversity and difficulty of the dataset or if it is a result of poorly selected hyperparameters such as LR, or is caused by my smaller batch size.

What was the value of your loss at the end of the 240 epochs?

I am sorry to ask many questions but it's frustrating to lose days of training time on my GPU by restarting training so if you could please guide a fellow student in the right direction I will be sure to help improve on your repository.

Thank you, Alex

SikandAlex avatar Oct 14 '21 20:10 SikandAlex

Well, I apologize because I finally just noticed that your papers mentions you used an NVIDIA GTX TITAN XP that also has the same video memory as me.

So I am confused as to how you managed a batch size of 24 with a card with the same video memory as me at 640x640 resolution.

SikandAlex avatar Oct 14 '21 22:10 SikandAlex

Through reading the file "tools/train.py", I think the comments in the README is right and that in code file is wrong. When we want to training the net end to end, we should set all parameters as False. image

LoveYueForever avatar Nov 07 '21 14:11 LoveYueForever

我同样遇到目标检测很难训练好,两个分割分支训练效果挺好的。另外我在训练过程中发现即使按照单任务训练的时候,网络冻结了相关参数,但好像还是会有一些影响,例如我训练目标检测时,两个分割分支效果变差了。

I also encounter that object detection is difficult to train well. The training effect of two segmentation branches is very good. In addition, during the training process, I found that even if the network freezes the relevant parameters during the single task training, it still seems to have some impact. For example, when I train object detection, the effect of two segmentations branches becomes worse.

hetao828 avatar Dec 01 '21 08:12 hetao828

Well, I apologize because I finally just noticed that your papers mentions you used an NVIDIA GTX TITAN XP that also has the same video memory as me.

So I am confused as to how you managed a batch size of 24 with a card with the same video memory as me at 640x640 resolution.

I think that author use tow GPUs not only one GPU. You can see here: _C.GPUS = (0,1)

thinkthinking avatar May 16 '22 05:05 thinkthinking

The default.py training configuration says (Line 102 - 112):

# if training 3 tasks end-to-end, set all parameters as True
# Alternating optimization
_C.TRAIN.SEG_ONLY = False           # Only train two segmentation branchs
_C.TRAIN.DET_ONLY = False           # Only train detection branch
_C.TRAIN.ENC_SEG_ONLY = False       # Only train encoder and two segmentation branchs
_C.TRAIN.ENC_DET_ONLY = False       # Only train encoder and detection branch

# Single task 
_C.TRAIN.DRIVABLE_ONLY = False      # Only train da_segmentation task
_C.TRAIN.LANE_ONLY = False          # Only train ll_segmentation task
_C.TRAIN.DET_ONLY = False          # Only train detection task

But the ReadMe says:

If you want try alternating optimization or train model for single task, please modify the corresponding configuration in ./lib/config/default.py to True. (As following, all configurations is False, which means training multiple tasks end to end).

# Alternating optimization
_C.TRAIN.SEG_ONLY = False           # Only train two segmentation branchs
_C.TRAIN.DET_ONLY = False           # Only train detection branch
_C.TRAIN.ENC_SEG_ONLY = False       # Only train encoder and two segmentation branchs
_C.TRAIN.ENC_DET_ONLY = False       # Only train encoder and detection branch

# Single task 
_C.TRAIN.DRIVABLE_ONLY = False      # Only train da_segmentation task
_C.TRAIN.LANE_ONLY = False          # Only train ll_segmentation task
_C.TRAIN.DET_ONLY = False          # Only train detection task

Therefore, the comments in the Python file say to set these parameters as True for End-End training, but the ReadMe instructs setting them as False for End-End training.

I am trying to reproduce your results using an RTX 2080Ti. I read your paper many times but did not see a mention of the training configuration you used for your experiment. Since the batch size was set to 24 and I seem to only be able to fit only 12 batch size at maximum with my 11GB of video memory.

Currently, I've been training for several days and the lane segmentation and driveable surface already look pretty good but the object detection head is still way too confident and overproducing predictions.

I ran out of disk space at several points over the course of this experiment and had to resume training from the latest checkpoint a couple times which is why there is gaps in the chart.

No Smoothing train_loss

Smoothing train_loss (1)

My parameters and dataset were identical to those in the default.py training configuration. I am wondering whether this slow convergence and oscillation in the loss is normal given the diversity and difficulty of the dataset or if it is a result of poorly selected hyperparameters such as LR, or is caused by my smaller batch size.

What was the value of your loss at the end of the 240 epochs?

I am sorry to ask many questions but it's frustrating to lose days of training time on my GPU by restarting training so if you could please guide a fellow student in the right direction I will be sure to help improve on your repository.

Thank you, Alex

hi , I meet same problem as you, do you fix it ? Thank u very much! I think this may be caused by simple detection head in YOLOP, if you want to get good result ,you must train a lot of epoches .

dedellzyw avatar Jun 13 '22 06:06 dedellzyw

this is my loss: no smoothing: image smoothing: image

dedellzyw avatar Jun 13 '22 06:06 dedellzyw