autodeeplab
autodeeplab copied to clipboard
Slow Training, any Advice?
Hi, I'm using this command line:
CUDA_VISIBLE_DEVICES=0,1,2,3 python3 train_autodeeplab.py --dataset coco --filter_multiplier 4 --resize 358 --crop_size 224 --batch-size 24
I'm trying to understand why is it so slow, I'm getting an epoch per day, in the paper they were talking about 3 days for 40 epoch. My question is this a known issue, or that I'm missing something? I'll be happy for any advice
Thanks!
Hi @albert-ba, for now I suggest training on 1 GPU with 2 batch size. It will take you a couple hours per epoch. It’s still not fast enough but we’re working on it.
Once we’ve sufficiently sped up the code on a single GPU, we’ll start looking into speeding up multi-GPU.
All the best!
Hi it doesn't seem to help.
98 hours to finish the epoch..
I know that this repo in still under optimization, I have no complains at all you are doing wonderfull job, but if think about something susspicous that I'm doing wrong please share.
maybe related to those warnings?
what GPUs are you running on? these warnings are known and can be ignored for now, i don't think they're related to your issue at all.
I see that you used the coco dataset. we usually conduct all our experiments with cityscapes. could you try to train with cityscapes with the default args and see what you get?
hi,I met the same question with you. I'm getting an epoch per day. I start training with this command line. CUDA_VISIBLE_DEVICES=0 python train_autodeeplab.py --dataset cityscapes
I use a single GPU quadro gv100, training with cityscapes. Batch size is 2. Other I am configured by default. Have you solved this problem?@albert-ba Do you have any advice?@NoamRosenberg @iariav
@fanrupin currently the multi-gpu doesn't always work faster. I have to look into this further but I don't have time now. If you have time to experiment and find the problem I would love you to make a pull request and become a contributer.
Hi, I'm using this command line:
CUDA_VISIBLE_DEVICES=0,1,2,3 python3 train_autodeeplab.py --dataset coco --filter_multiplier 4 --resize 358 --crop_size 224 --batch-size 24
I'm trying to understand why is it so slow, I'm getting an epoch per day, in the paper they were talking about 3 days for 40 epoch. My question is this a known issue, or that I'm missing something? I'll be happy for any advice
Thanks!
Hello, I also want to use this code on COCO dataset. I adopted the command line just like yours:
CUDA_VISIBLE_DEVICES=0,1 python3 train_autodeeplab.py --dataset coco --filter_multiplier 4 --resize 358 --crop_size 224 --batch-size 24
----gpu-ids 0,1
But I got an error:
I wanna know whether you have met this error? If yes, could you please tell my how to solve it?
Thanks a lot!
Hi, I'm using this command line:
CUDA_VISIBLE_DEVICES=0,1,2,3 python3 train_autodeeplab.py --dataset coco --filter_multiplier 4 --resize 358 --crop_size 224 --batch-size 24
I'm trying to understand why is it so slow, I'm getting an epoch per day, in the paper they were talking about 3 days for 40 epoch. My question is this a known issue, or that I'm missing something? I'll be happy for any advice Thanks!Hello, I also want to use this code on COCO dataset. I adopted the command line just like yours:
CUDA_VISIBLE_DEVICES=0,1 python3 train_autodeeplab.py --dataset coco --filter_multiplier 4 --resize 358 --crop_size 224 --batch-size 24
----gpu-ids 0,1But I got an error:
I wanna know whether you have met this error? If yes, could you please tell my how to solve it?
Thanks a lot!
you need to change the crop size