FCOS
FCOS copied to clipboard
I only have one GPU(GTX1060)
I only have one GPU(GTX1060), Can I do Distributed training with following script?
python -m torch.distributed.launch
--nproc_per_node=1
--master_port=$((RANDOM + 10000))
tools/train_net.py
--skip-test
--config-file configs/fcos/fcos_R_50_FPN_1x.yaml
DATALOADER.NUM_WORKERS 2
OUTPUT_DIR training_dir/fcos_R_50_FPN_1x
@hello-piger yes, you can use the command line, but you cannot benefit from this distributed training.
OK ,Thank you for your reply !
--------------原始邮件-------------- 发件人:"Tian Zhi "[email protected]; 发送时间:2019年9月2日(星期一) 中午11:40 收件人:"tianzhi0549/FCOS" [email protected]; 抄送:"hello-piger "[email protected];"Mention "[email protected]; 主题:Re: [tianzhi0549/FCOS] I only have one GPU(GTX1060) (#124)
@hello-piger yes, you can use the command line, but you cannot benefit from this distributed training.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
what should I do, if I want random initialization instead of using pretrain network, ?
Do you change the learning rate, batch_size and other hyperparameters in your traning process? I train the model on single gpu and the command is "python tools/train_net.py --config-file configs/fcos/fcos_R_50_FPN_1x.yaml DATALOADER.NUM_WORKERS 2 OUTPUT_DIR training_dir/fcos_R_50_FPN_1x". I change the parameters just like maskrcnn-benchmark, but I have a gap of performance. The total loss maintain between 1.0. Do you have some suggestions for me?
Do you change the learning rate, batch_size and other hyperparameters in your traning process? I train the model on single gpu and the command is "python tools/train_net.py --config-file configs/fcos/fcos_R_50_FPN_1x.yaml DATALOADER.NUM_WORKERS 2 OUTPUT_DIR training_dir/fcos_R_50_FPN_1x". I change the parameters just like maskrcnn-benchmark, but I have a gap of performance. The total loss maintain between 1.0. Do you have some suggestions for me?
Did you solve it later? I used one GPU to train, and the loss was about 1.0. Based on resnet-50, I have changed the super parameters according to the author, and the final result was only 35.8
Do you change the learning rate, batch_size and other hyperparameters in your traning process? I train the model on single gpu and the command is "python tools/train_net.py --config-file configs/fcos/fcos_R_50_FPN_1x.yaml DATALOADER.NUM_WORKERS 2 OUTPUT_DIR training_dir/fcos_R_50_FPN_1x". I change the parameters just like maskrcnn-benchmark, but I have a gap of performance. The total loss maintain between 1.0. Do you have some suggestions for me?
Did you solve it later? I used one GPU to train, and the loss was about 1.0. Based on resnet-50, I have changed the super parameters according to the author, and the final result was only 35.8
Hi! What dataset are you training on? COCO or? Also what model is your GPU?