FCOS icon indicating copy to clipboard operation
FCOS copied to clipboard

I only have one GPU(GTX1060)

Open hello-piger opened this issue 5 years ago • 6 comments

I only have one GPU(GTX1060), Can I do Distributed training with following script? python -m torch.distributed.launch
--nproc_per_node=1
--master_port=$((RANDOM + 10000))
tools/train_net.py
--skip-test
--config-file configs/fcos/fcos_R_50_FPN_1x.yaml
DATALOADER.NUM_WORKERS 2
OUTPUT_DIR training_dir/fcos_R_50_FPN_1x

hello-piger avatar Sep 02 '19 01:09 hello-piger

@hello-piger yes, you can use the command line, but you cannot benefit from this distributed training.

tianzhi0549 avatar Sep 02 '19 03:09 tianzhi0549

OK ,Thank you for your reply !

--------------原始邮件-------------- 发件人:"Tian Zhi "[email protected]; 发送时间:2019年9月2日(星期一) 中午11:40 收件人:"tianzhi0549/FCOS" [email protected]; 抄送:"hello-piger "[email protected];"Mention "[email protected]; 主题:Re: [tianzhi0549/FCOS] I only have one GPU(GTX1060) (#124)

@hello-piger yes, you can use the command line, but you cannot benefit from this distributed training.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

hello-piger avatar Sep 02 '19 03:09 hello-piger

what should I do, if I want random initialization instead of using pretrain network, ?

LIUhansen avatar Dec 03 '19 03:12 LIUhansen

Do you change the learning rate, batch_size and other hyperparameters in your traning process? I train the model on single gpu and the command is "python tools/train_net.py --config-file configs/fcos/fcos_R_50_FPN_1x.yaml DATALOADER.NUM_WORKERS 2 OUTPUT_DIR training_dir/fcos_R_50_FPN_1x". I change the parameters just like maskrcnn-benchmark, but I have a gap of performance. The total loss maintain between 1.0. Do you have some suggestions for me?

sherwincn avatar Aug 17 '20 09:08 sherwincn

Do you change the learning rate, batch_size and other hyperparameters in your traning process? I train the model on single gpu and the command is "python tools/train_net.py --config-file configs/fcos/fcos_R_50_FPN_1x.yaml DATALOADER.NUM_WORKERS 2 OUTPUT_DIR training_dir/fcos_R_50_FPN_1x". I change the parameters just like maskrcnn-benchmark, but I have a gap of performance. The total loss maintain between 1.0. Do you have some suggestions for me?

Did you solve it later? I used one GPU to train, and the loss was about 1.0. Based on resnet-50, I have changed the super parameters according to the author, and the final result was only 35.8

whalesea avatar Apr 06 '21 06:04 whalesea

Do you change the learning rate, batch_size and other hyperparameters in your traning process? I train the model on single gpu and the command is "python tools/train_net.py --config-file configs/fcos/fcos_R_50_FPN_1x.yaml DATALOADER.NUM_WORKERS 2 OUTPUT_DIR training_dir/fcos_R_50_FPN_1x". I change the parameters just like maskrcnn-benchmark, but I have a gap of performance. The total loss maintain between 1.0. Do you have some suggestions for me?

Did you solve it later? I used one GPU to train, and the loss was about 1.0. Based on resnet-50, I have changed the super parameters according to the author, and the final result was only 35.8

Hi! What dataset are you training on? COCO or? Also what model is your GPU?

Kyle-fang avatar Jul 05 '22 11:07 Kyle-fang