yolov7 icon indicating copy to clipboard operation
yolov7 copied to clipboard

Training so, so slow

Open CAT1210 opened this issue 1 year ago • 6 comments

My training is also very slow. I set paste_in to 0 and lowered workers down to 4, which seems to be my sweet spot as lower or higher than 4 takes about 3 hours per epoch but, still, at 4, it still takes about an hour and a half to train a single epoch. Not sure what else to try. I saw somebody set mosaic to 0 but that seems like something I don't want to do. Using an Intel i7-9700k so it has 8 cores. My custom dataset has 12 classes. I have about 105K images.

CAT1210 avatar Jul 31 '22 05:07 CAT1210

You could try to use ComputeLoss instead of ComputeLossOTA.

WongKinYiu avatar Jul 31 '22 06:07 WongKinYiu

You need to use a CUDA capable GPU i.e. Nvidia to train anything larger than a toy dataset in a reasonable time, that's how computer vision works in practice. You can use Google Colab or Kaggle with their limitations if you have no access to a credit card. You can also use vast.ai for an affordable price if your dataset is not supersecret (as docker is not secure from attacks from the host) or try some of the new services, or GCP/AWS if money is no objection.

guillermo-gabrielli-fer avatar Jul 31 '22 06:07 guillermo-gabrielli-fer

Oh for sure...I would never try to run yolo (any version) on anything other than a GPU....I use an RTX 3060. I should have mentioned that in my original post. Now, I apologize, I need to ask a stupid question.....where would I need to make the change to use ComputeLossOTA instead of ComputeLoss? I see that train has both in there and I tried just changing line 422 but that didn't make a difference so I assume that wasn't right....

CAT1210 avatar Jul 31 '22 06:07 CAT1210

Pretty sure I found where to make this change but it didn't make any difference in how quick an epoch trains for me.

CAT1210 avatar Jul 31 '22 16:07 CAT1210

same as my problem! I trained on RTX 2070, with the dataset of 85K images and 17 classes, It takes ~5 hours for one epoch! Is there any solution to improve it? thanks

dungdo123 avatar Aug 02 '22 02:08 dungdo123

Same problem! Training on TITAN RTX, with very low CPU and GPU Utilization.

Sadcardation avatar Aug 06 '22 14:08 Sadcardation

I training 12hours for each epoch with rtx 3080ti Who can help me!!!

KelvinHuang66 avatar Aug 09 '22 03:08 KelvinHuang66

Try to set paste_in: 0.00 in hyp.scratch.yaml file Or repace compute_loss_ota to compute_loss in this line https://github.com/WongKinYiu/yolov7/blob/main/train.py#L362

Does it help?

AlexeyAB avatar Aug 09 '22 04:08 AlexeyAB

Try to set paste_in: 0.00 in hyp.scratch.yaml file Or repace compute_loss_ota to compute_loss in this line https://github.com/WongKinYiu/yolov7/blob/main/train.py#L362

Does it help?

I tried it since last week and it increased the speed a little but not much! My dataset includes ~85k images of 17 classes, and it took 5 days to complete 18 epochs ... Besides, compute_loss required 3 parameters, so we have to remove "imgs" params in the input, does it affect the training?

dungdo123 avatar Aug 09 '22 04:08 dungdo123

Besides, compute_loss required 3 parameters, so we have to remove "imgs" params in the input, does it affect the training?

This is correct.

I added a fix, now you can just set loss_ota: 0 for faster training

https://github.com/WongKinYiu/yolov7/blob/711a16ba576319930ec59488c604f61afd532d5a/data/hyp.scratch.custom.yaml#L31

AlexeyAB avatar Aug 09 '22 04:08 AlexeyAB

Besides, compute_loss required 3 parameters, so we have to remove "imgs" params in the input, does it affect the training?

This is correct.

I added a fix, now you can just set loss_ota: 0 for faster training

https://github.com/WongKinYiu/yolov7/blob/711a16ba576319930ec59488c604f61afd532d5a/data/hyp.scratch.custom.yaml#L31

Thanks for your reply, this really helps!

Sadcardation avatar Aug 09 '22 13:08 Sadcardation

am ruunign my yolov7 on custom dataste with 6k images on colab pro. BUt still taking lot of time to train. what paarameters needs to be changed?. !python train.py --batch 16 --epochs 50 --data new-1/data.yaml --weights 'yolov7_training.pt'

the above command is used for training. please assist.

Joshnavarma avatar Nov 02 '23 20:11 Joshnavarma