CrossStagePartialNetworks
CrossStagePartialNetworks copied to clipboard
Need help in setting hyper-parameters
@WongKinYiu I have been trying to set right hyper parameters for yolov3-spp but for complete open-images dataset after 300-400 iterations server restarts I have previously trained with 3 of the classes among 601 classes but then I used single GPU parameters for multi GPU training when training for threee classes and dataset size is also small then like 1100 images or So. But now training on whole dataset with Multi GPU parameters causing system reboot,
BTW how do you calculate the parameters for multi-GPU you have already replied to me in previous issues on @AlexeyAB repo at the core how to set burn-in,learning rate, decay in cfg file.as narrated by alexy is causing issue so I changed almost all the parameters to single GPU config except burn-in even then problem persists
For the above hardware here is the link to config I'm using
Please help me out
Thanks
But now training on whole dataset with Multi GPU parameters causing system reboot,
This is a hardware issue: power insufficient or hardware bug in GPU.
Should I upgrade my PSU.. to meet the needs or reducing the image resolution to smaller size might also work but decrease in accuracy right
try to train by using 2-3 GPUs instead of 4.
@RajashekarY what is your PSU?
Actually I don't know @LukeAI I remotely use this system I need to ask the owner😛 But
try to train by using 2-3 GPUs instead of 4.
Might get the job done
The peak of Titan RTX is about 390W, so you need at least 1500W and better 2000W power supply. However, in my experiments, it usually cause by protection of mainboard due to a single PSU is not stable enough for multiple GPUs. In our case, we use dual PSUs for single mainboard.